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Abstract 


Recent  efforts  to  determine  the  proper  role  of  formal 
statistical  estimation  in  modeling  "System  Dynamics"  models 
show  that  the  parameter  estimates  derived  from  Ordinary  and 
Generalized  Least  Squares  (OLS  and  GLS)  are  highly  sensitive 
to  errors  in  data  measurement,  and  likely  to  prove 
misleading  if  used  as  a  basis  for  selection  of  parameter 
values  or  structural  analysis. 

Using  the  framework  developed  in  this  previous  work,  - 
where  a  nonlinear  feedback  model  generates  synthetic  data, 
which  is  then  used  to  estimate  model  parameters  and  thus 
provide  a  basis  for  the  evaluation  of  an  estimation 
technique,  -  this  thesis  reviews  previous  results  with  OLS 
and  investigates  alternative  estimation  techniques. 

A  review  of  both  econometric  and  engineering 
techniques,  together  with  some  preliminary  experimental 
results  revealed  that  no  econometric  estimation  technique 
proved  capable  of  meeting  the  requirements  of  the  estimation 
of  parameters  in  a  nonlinear  dynamic  feedback  model  in  the 
presence  of  measurement  noise. 

The  only  promising  method,  the  Filtering  Form  of  the 
Maximum  Likelihood  algorithm,  was  found  in  the  engineering 
literature  where  it  is  being  used  in  a  growing  number  of 
appl i cat  ions . 

A  general  FORTRAN  program  was  developed  to  implement 
this  algorithm  and  was  tried  out  on  three  small-scale  linear 
and  nonlinear  models.  The  method  was  found  to  be  capable  of 
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drastically  improving  upon  the  Least  Squares  estimates,  if 
sufficient  Knowledge  about  the  noise  statistics  (which 
present  ident i f i abi 1 i ty  problems  if  they  are  all  to  be 
estimated)  was  available. 

However,  experimentation  on  Forrester's  " Market -Growth" 
model,  while  still  modest  in  size  compared  to  many  System 
Dynamics  models  (nine  equations  and  fifteen  parameters  to 
estimate),  revealed  the  many  limitations  (in  particular  in 
convergence  and  cost)  of  this  algorithm,  that  preclude  its 
use  in  socio-economic  applications. 

In  the  light  of  the  above  results,  alternative  methods 
of  model  validation,  and  in  particular  a  more  formal  use  of 
sensitivity  testing,  are  suggested  for  further  research. 
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NOTATION 


b:  parameter  to  be  estimated 
b :  estimate  of  b 

b :  best  estimate  (if  b  has  already  been  used  as  an  estimate) 
x:  state 

x:  estimate  of  state 
z:  measurement 
z:  predicted  measurement 

A:  matrix  (all  FORTRAN  variables  are  also  capitalized) 

A:  estimate  of  A 
A'  :  transpose  of  matrix  A 
A'1:  inverse  of  matrix  A 
| A | :  determinant  of  A 
tr(A):  trace  of  matrix  A 
u:  vector 

u'  :  transpose  of  a  vector 

u'  :  scalar,  different  from  u 

u  :  time  dependency  (also  indicated  as  u(t)) 

T :  sample  size 

Z ( T ) :  when  defined,  set  of  values  of  z(t)  from  t=0  to  t=T 

x:  mean  value  of  x 

E(x):  expectation  of  x 

Var(x):  variance  of  x 

Cov(x,y):  covariance  of  x  and  y 

p( x ) :  probability  function  of  x 
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plim  x:  probability  limit 

s  :  variance  of  a  scalar 

S  :  covarince  matrix  of  a  vector 

•:  scalar,  vector  or  matrix  multiplication  (used  only  when 
needed  for  more  clarity) 

*:  multiplication  in  FORTRAN 
L:  Likelihood  function 

1:  letter  "l"  (as  opposed  to  number  "1") 

Log:  Natural  logarithm 
d :  partial  differentiation  operator 

NOTES 

1.  Equations  are  numbered  separately  for  each  section.  The 
section  number  appears  before  the  equation  number  only 
when  referring  to  an  equation  in  another  section. 

2.  The  parameter  vector,  equation  and  measurement  noise 
covariance  matrices  are  noted  A,  Q  and  R,  respectively, 
in  the  FORTRAN  equations  and  in  all  experimental 

resul ts . 
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I.  INTRODUCTION 


A.  THE  NEED  OF  OUR  TIMES 

We  live  in  a  time  of  social  crisis.  One  of  the  major 
underlying  causes  for  this  crisis  is  that  the  systems  we 
live  in  have  grown  so  complex  that  no  one  can  claim  to 
understand  fully  the  mechanisms  involved  in  these  systems  or 
predict  their  reaction  to  a  given  policy  change. 

According  to  Warfield  [1],  much  of  the  difficulty  in 
coping  with  complex  systems  comes  about  because  the  rate  of 
change  in  society  is  faster  than  the  rate  at  which  we  can 
respond  effectively. 

And  societal  problems  are  highly  complex: 

"They  involve  multiple  interactions  and  feedback 
among  various  elements  of  society;  they  are  highly 
interdisciplinary  and  transdiscipl inary ,  with 
social,  economic,  political  and  emotional  factors 
intertwined  with  more  quantifiable  factors  of 
physical  technology;  they  cannot  be  shown  easily  to 
be  "solved"  by  experiment;  and  the  implementation  of 
the  presumed  amelioration  to  a  problem  changes  the 
system  in  complex  ways..."  [1] 

It  was  crucial  therefore,  that  "new,  improved  and 
systematic  ways  of  coping  with  such  complex  problems  be 
developed" . . . 

Perhaps  one  of  the  most  promising  applications  of  technology 
available  to  decision  makers  to  help  them  deal  with 
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increasingly  complex  societal  problems  could  be  broadly 
labelled  "Systems  Analysis",  the  first  truly 
i nterdi sci p 1 i nary  approach,  stemming  out  from  the  post-war 
field  of  "General  Systems  Theory"  [2],  and  the  closely 
related  fields  of  "Cybernetics"  and  "Information  Theory" 

(see  for  example  [3]  and  [4]). 

The  growing  family  of  world  models  associated  with 
Forrester  [5],  Meadows  et  al .  [6],  Mesarovic  and  Pestel  [7] 
(to  quote  only  a  few)  have  demonstrated  the  power  of  systems 
concepts  and  modeling  for  organizing  one's  thought  and  for 
tackling  seemingly  intractable  problems. 

B.  TWO  APPROACHES  TO  MODELING 

There  are  two  main  trends  in  mode  1 -bui Idi ng ,  based  on 
entirely  different  approaches:  "Econometrics"  and  "System 
dynamics" . 

These  two  types  of  models  differ  both  in  their  construction 
and  in  their  purpose. 

A.1  Econometric  models 

Econometric  models,  derived  from  the  methods  of  the 
social  sciences,  rely  heavily  on  formal  statistical 
techniques  and  this  tends  to  influence  their  structure. 
Because  the  available  estimation  techniques  are  mostly 
linear,  econometricians  will  tend  to  linearize  any  nonlinear 
relationship,  to  omit  all  variables  which  cannot  be  measured 
or  for  which  data  is  not  available  and  all  feedback 
relationships  which  cannot  be  derived  from  statistical  data. 
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The  regression  procedures  will  also  often  reject  any 
hypothesized  relationship  that  has  not  played  a  significant 
role  in  the  past  (though  it  may  be  crucial  in  the  future). 
The  usual  explanatory  equation  expresses  each  endogenous 
variable  as  a  function  of  its  assumed  direct  causes  : 


y,  =  f,-  (y,  -y. 


y- 


I  -  I 


>x, 


x*> 


1  ) 


where  xj  are  predetermined  variables  (exogenous  and  lagged 
endogenous).1  Thus,  an  econometric  model  is  a  system  of  G 
equations  explaining  the  G  endogenous  variables. 
Specification  of  the  structure  of  an  econometric  model  is 
the  most  crucial  step  in  that  later  statistical  estimation 
and  validation  procedures  assume  that  the  structure  is 
cor rect . 

Brown  [8]  defines  specification  as  the  process  of 
deciding  on  the  hypothetical  structure  of  a  model, 
preparatory  to  testing  this  with  observed  data,  and  finally 
estimating  its  parameters.  Clearly  then,  the  first  major 
part  of  specification  is  to  decide  which  variables  belong  to 
which  side  of  the  equations.  The  next  phase  of  specification 
involves  deciding  on  the  form  of  each  equation  (very  often 
chosen  as  1 i near ) . 

"  Good  specification  is  vital  for  econometric 
model -bui ldi ng .  Without  careful  attention  to  this 
phase  of  the  work,  we  can  easily  settle  for  spurious 


1  For  definition  of  terms  see  page  27 
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relations,  selected  by  searching  only  for  good  fits 
and  high  correl at  ions .  With  small  samples  of  data, 
it  is  easy  to  hammer  out  hypotheses  to  conform 
closely  to  the  sample.  Specification  is  then  biased 
toward  the  particular  sample..."  [8] 

However,  the  major  decision  criteria  come  directly  from 
economic  theory. . . 

The  way  in  which  careful  specification  can  prevent 
acceptance  of  faulty  structures  is  to  let  economic 
theory  dominate  over  the  statistical  tests,  when 
these  are  being  made  from  small  samples.  Thus,  when 
the  statistical  tests  favour  an  apparently  inferior 
hypothesis,  the  econometrist  should  put  more  weight 
on  what  he  believes  to  be  superior  theoretical 
specification,  while  checking  his  preferred  theory 
for  possible  flaws.  Decisions  and  judgements  of  this 
kind  of  course  require  a  considerable  experience  and 
expertise  in  the  workings  of  the  macro-economy..." 

[8] 

Thus,  although  econometricians  claim  their  models  are 
based  on  measured  data,  the  judgemental  process  in  the 
choice  of  an  equation  predominates.  Because  econometric 
models  do  not  attempt  to  explain  the  internal  mechanisms 
that  generate  the  observed  data,  they  typically  contain  many 
exogenous  variables  (variables  that  are  not  determined  by 
the  model ) . 

Many  large-scale  econometric  models  of  national  and 
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provincial  economies  have  been  constructed  in  the  last  two 
decades  and  are  mostly  used  by  government  agencies  and 
central  banks  in  short-term  forecasting  and  policy 
simulation.  Although  they  perform  well  in  short-term 
forecasting,  they  do  not  generally  pick  up  major  turning 
points,  because  they  assume  the  continuation  of  present 
trends,  do  not  attempt  to  explain  mechanisms  and  are  tested 
only  with  mild  changes  in  policy  variables. 

A. 2  System  dynamics  models 

The  structure  of  system  dynamics  models,  on  the  other 
hand,  is  not  limited  by  the  difficulties  inherent  in  formal 
statistical  estimation  techniques.  System  dynamics  models 
rely  on  a  philosophy  of  structure  in  systems  that  results  in 
many  factors  that  make  formal  estimation  very  difficult, 
e.g.,  strong  non  1 i near i t i es ,  multiple  feedback  loops, 
unmeasured  variables,  delays,  etc... 

Because  of  its  relative  novelty,  this  approach  will  be 
developed  in  somewhat  more  detail  here  than  the  econometric 
one . 

Most  of  the  following  description  of  system  dynamic's 
methodology  is  based  on  "Urban  dynamics"  [9],  but  all 
Forrester' s  models  basically  have  the  same  features. 

"  In  the  1960's,  Professor  Jay  Forrester  of 
Massachussets  Institute  of  Technology  pulled 
together  some  existing  techniques  of  feedback 
control,  mixed  in  some  of  Norbert  Wiener's 
principles  of  cybernetics,  added  inventions  of  his 
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own,  and  developed  a  methodology  that  is  Known  as 
System  Dynamics.  "  ( McLeod ,[ 1 0 ] ) 

Structure  in  a  system  is  seen  as  having  four 
significant  hierarchies,  as  illustrated  in  Figure  1-1  (from 
Sage  [11]). 

1.  The  closed  boundary 

2.  The  feedback  loops  as  the  basic  structural  elements 
within  the  boundary 

3.  Level  (state)  variables  representing  accumulations 
within  the  feedback  loops,  and  Rate  (flow)  variables 
representing  activity  within  the  feedback  loops 

4.  Goal ,  observed  condition ,  detection  of  discrepancy , 
action  based  on  discrepancy  as  components  of  a  rate 
var i able . 

The  boundary  of  a  system  is  chosen  to  include  those 
interacting  components  necessary  to  generate  the  modes  of 
behavior  of  interest :  this  means  that  the  latter  are  not 
imposed  from  the  outside  but  created  within  the  boundary. 

The  system  can  however  be  influenced  by  external  forces,  but 
these  should  be  random,  not  give  the  system  its  intrinsic 
growth  and  stability  character i st i cs . 

The  dynamic  behavior  of  the  system  is  generated  within 
the  feedback  loops,  which  are  the  fundamental  building 
blocks  of  systems.  Figure  1-2  shows  the  simplest  possible 
feedback- loop  structure. 

A  feedback  loop  is  a  structure  within  which  a  decision 
point  -  the  rate  equation  (symbolized  by  a  valve)  - 
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Level  1 
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FIGURE  1-1:  Four  elements  of  structure 


the  System  Dynamics  methodology. 
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FIGURE  1-2:  A  simple  feedback  loop. 
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the  flow  or  action  stream.  The  action  is  accumulated 
(integrated)  to  generate  a  system  level.  Information  about 
the  level  is  the  basis  on  which  the  flow  rate  is  controlled. 
The  "cloud"  describes  an  unlimited  quantity. 

The  rate  equations  are  the  statements  of  system  policy. 
They  determine  how  the  available  information  is  converted  to 
an  action  stream.  Within  each  rate  equation  there  is 
implicitly  or  explicitly  a  goal  toward  or  away  from  which 
that  decision  point  in  the  system  is  striving.  There  is  also 
a  process  whereby  the  observed  condition  of  a  system  is 
detected.  A  rate  equation  states  the  discrepancy  between  the 
goal  and  the  observed  condition.  And  finally,  the  rate 
equation  states  the  action  that  will  result  from  the 
di screpancy . 

Once  all  the  rate  and  level  variables  have  been 
identified  (into  three  major  sectors:  population,  industry 
and  housing,  in  the  case  of  Urban  Dynamics,  for  example)  and 
carefully  defined,  once  the  causal  relationships  are 
determined,  the  various  building  blocks  are  interconnected 
using  a  causal  loop  diagram  to  form  a  flow  diagram,  a  part 
of  which  is  illustrated  in  Figure  1-3. 

The  rate  equations  are  postulated  and  the  parameters 
specified  and  then  all  the  equations  are  written  in  a  DYNAMO 
language,  typically  using  a  set  of  "normal"  conditions  which 
are  modified  by  multipliers,  representing  the  deviation  of 
the  actual  system  from  the  "normal",  as  illustrated  below. 
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FIGURE  1-3:  Flow  diagram  of  part  of  the  Urban  Dynamics  model 
( f  rom  [ 9 ]  ) 


A  typical  rate  equation  is  the  one  describing  the 
arrival  of  underemployed  into  the  urban  area  : 

UA.KL  =  (U.K  +  L.K) (UAN) (AMMP.K) 

UAN  =  .05 
where 

UA-  =  UNDEREMPLOYED  ARRIVALS  (MEN/YEAR) 

U  =  UNDEREMPLOYED  (MEN) 

L  =  LABOR  (MEN) 

UAN  =  UNDEREMPLOYED  ARRIVALS,  NORMAL  ( FRACT ION/ YEAR ) 

AMMP  =  ATTRACTIVENESS- FOR -MIGRATION  MULTIPLIER  PERCEIVED 
(DIMENSIONLESS) 

The  firtst  term,  (U+L),  is  the  sum  of  the  underemployed 
and  labor  populations.  The  second  term,  UAN,  is  a  "normal" 
multiplier  which  represents  the  fraction  of  existing 
inhabitants  (U+L)  that  would  come  to  the  area  each  year. 

With  a  UAN  value  of  .05,  the  statement  is  made  that  under 
normal  circumstances  the  inflow  to  the  area  would  represent 
5%  of  (U+L).  AMMP  is  a  modulating  term  representing  the 
attractiveness  of  the  area  compared  to  normal.  As  the 
attractiveness  changes,  the  value  of  AMMP  changes  taking 
values  less  or  greater  than  1.  K  and  L  =  K+1  represent 
points  in  time.  The  rate  UA  is  constant  during  the  small 
unit  interval  KL,  and  its  value  during  this  period  depends 
on  the  values  of  U,  L,  AMMP  at  time  K.  All  the  equations  of 
the  system  are  thus  iteratively  updated  and  generate  a  time 
path . 

Once  the  whole  program  is  thus  formulated  and  verified, 
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the  model  is  ready  to  run.  A  reference  is  chosen  and  then 
various  changes  in  equations  and  parameters  are  made  to  test 
the  sensitivity  of  the  model.  Changes  in  policy  variables 
are  also  performed,  each  new  run  being  compared  to  the 
reference  run  in  order  to  analyze  the  effects  of  that  given 
change . 

Typically,  the  models  are  highly  i nsensit ive  to  most  of 
the  parameter  changes ;  "common"  or  "traditional"  policies 
designed  to  solve  the  system' s  problems  are  found  to  be 
ineffective  or  making  matters  worse;  a  few,  unexpected, 
couter i ntui t i ve  policies  are  shown  to  be  more  likely  to 
deeply  alter  the  system's  behavior  in  the  desired  way... 

Similar  patterns  were  followed  in  "Industrial 
Dynamics"  [12]  and  in  "World  Dynamics"  [5]  which  led  to  the 
well-known  "Limits  to  growth"  [6]. 

All  of  these  models  caused  a  major  break-through  in 
traditional  ways  of  thinking  and  have  been  highly  praised  by 
the  scientific  and  decision-making  communities.  All  of  them 
have  also  been  criticized  because  of  the  many  over¬ 
simplifications  made  (see  for  example  Sage  [11],  Ansoff  and 
Slevin  [13],  and  the  "Sussex  Group"  [14]). 

It  appears  from  Forrester's  writing  that  "expert 
opinion"  rather  than  real  data  were  often  used  in 
constructing  the  many  nonlinear  relationships  and  in 
evaluating  the  multipliers  used.  He  has  been  criticized  for 
developing  a  model  with  built-in  bias,  for  not  explicitly 
stating  a  value  system,  and  for  making  untested  and  possibly 


■ 


13 


invalid  assumptions... 

Although  some  of  the  critiques  are  justified  -  no  model 
is  perfect,  specially  at  this  early  stage  of  the  technology 
they  have,  in  general,  not  been  constructive,  reflecting 
conservationist  attitudes,  as  it  appears  from  the  responses 
to  these  critiques  ( see  Forrester  [15],  Meadows  et  al .  [14] 
To  summarize,  in  Forrester' s  words  [16]: 

Compared  to  an  econometric  model,  a  system-dynamic 

model : 

1.  Makes  greater  use  of  descri pt ive  information  and 
managerial  and  political  experience, 

2.  Incorporates  a  broader  range  of  variables  and 
encompasses  the  many  relevant  disciplines  outside  of 
economics , 

3.  Uses  numerical  data  from  real  life  in  a  different  way  - 
in  model  construction  to  complement  descr i pt ive 
information,  in  validation  to  compare  with  the 
corresponding  output  data  from  the  model, 

4.  Generates  social  and  economic  fluctuations  and  growth 
from  the  internal  feedback  structure  without  using 
exogenous  variables  to  cause  the  change  to  occur, 

5.  Includes  important  social  and  psychological  variables 
for  which  statistical  data  are  not  avai lable, 

6.  Explains  how  structural  change  occurs  as  the  economy 
moves  into  new  modes  of  behavior  that  are  not 
represented  in  past  time-series  data. 

Facilitates  incorporating  the  wide  range  of  nonlinear 
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structures  that  generate  so  much  of  observed  real-world 
behavior , 

8.  Emphasizes  the  conservation  of  flows  by  including  the 
buffer  stocks  that  decouple  the  instantaneous  flow 
rates , 

9.  Distinguishes  more  sharply  between  real  variables,  their 
money  value,  and  information  about  them  to  capture  the 
dynamic  interactions  between  these  elements  of  a 
real-world  system, 

10.  Ecourages  construction  of  a  deeper  substructure  of 
feedback  loops  to  represent  causal  mechanisms  underlying 
macroeconomic  behavior, 

11.  Organizes  structure  so  that  each  parameter  has 
independent  real-life  meaning  in  the  operating  world  and 
can  be  invi dually  drawn  from  and  checked  against 
descri pt  ive  and  quant  it at ive  information  available  at 
the  place  in  the  real  system  to  which  the  parameter 
appl ies, 

12.  Serves  as  a  more  effective  communication  medium  for 
resolving  disagreements  because  of  the  way  both  model 
structure  and  parameters  correspond  with  descriptive 
knowledge  in  the  operating  world, 

13.  Places  more  emphasis  on  the  importance  of  internal 
structure , 

14.  Focusses  more  on  understanding  the  reasons  for  observed 
behavior  and  on  developing  policies  to  produce  better 
behavior,  and  focusses  less  on  prediction, 
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15.  Combines  over  a  greater  time  span  short-term  with 
long-range  human  objectives, 

16.  Permits  a  wider  diversity  of  contact  between  the  model 
and  the  real  world  to  make  validation  more 
persuasive ..." 

This  lenghty  presentation  of  the  main  features  of 
system  dynamics  models  gives  an  idea  of  the  theory  on  which 
they  are  built,  namely  that  "the  real  social  systems  belong 
to  the  class  of  nonlinear  i ntegrat ion- feedback  systems"  , 
theory  which  according  to  Senge  [17],  is  a  necessary 
foundation  for  any  analysis  of  the  usefulness  of  a 
statistical  technique  for  modeling  practice  . 

"The  generality  of  an  analysis  of  estimator  accuracy 
depends  on  whether  or  not  the  data-gener at i ng  model 
in  the  experimental  laboratory  (or  the  assumed  "true 
system"  in  the  theoretical  analysis)  belongs  to  the 
same  class  of  systems  as  real-life  social  systems. 

If  the  data-generat i ng  model  does  not  belong  to  the 
proper  class  of  systems,  the  investigator  has  little 
basis  for  inferring  from  the  laboratory  analysis  how 
a  statistical  technique  will  perform  in  real-life 
appl i cat  ions . "  [17] 

C.  SENGE' S  ATTEMPT 

In  a  recent  paper,  [18],  Peter  SENGE  has  attempted  to 
clarify  issues  underlying  the  controversy  over  statistical 
parameter  estimation  in  feedback  models. 
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Experiments  with  Ordinary  Least  Squares  (OLS)  and 
Generalized  Least  Squares  (GLS)  estimation,  using 
Forrester's  'Market  growth'  model  [19],  show  that  the 
parameter  estimates  derived  from  these  methods  are  highly 
sensitive  to  errors  in  data  measurement,  especially  when  a 
model's  feedback  structure  is  not  completely  known.  In  doing 
so,  he  has  developed  an  experimental  procedure  to  evaluate 
the  estimation  techniques,  and  a  range  of  experimental 
conditions  exploring  the  effect  on  estimation  results  of  a 
particular  facet  of  real  life  mode  1 -bui ldi ng  or  data 
measurement . 

This  framework  provides  useful  guidelines  for  conducting 
further  investigation  of  these  issues. 

He  concludes  by  suggesting  that  the  "  ability  of 
alternative  estimation  techniques  to  improve  upon  the 
performance  of  OLS  and  GLS  should  be  explored.  Statistical 
methods  which  warrant  investigation  include  alternarive 
econometric  techniques,  as  well  as  techniques  developed  in 
such  fields  as  engineering..."  [18],  and  also  that  the 
accuracy  of  these  techniques,  including  OLS  and  GLS,  should 
be  examined  for  alternative  feedback  models. 

D.  THIS  THESIS 

It  is  in  the  light  of  all  the  above  facts  that  the 
present  investigation  proposes  to  try  alternative 
econometric  and  engineering  methods  to  estimate  the  market 
growth  model . 
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As  a  first  step,  sections  II,  III  and  IV  attempt  to 
review  the  main  trends  in  system  identification,  both  in  the 
econometric  and  the  engineering  literatures.  Although  there 
are  many  similarities  between  the  two  fields,  they  have 
developed  in  different  directions,  used  different 
terminologies,  and  emphasized  different  types  of  models. 
Current  research  seems  to  be  moving  in  the  direction  of 
bridging  the  gap  between  the  "two  disciplines,  particularly 
in  the  attempts  to  use  Kalman  filtering  concepts  in 
econometrics,  but  by  and  large,  current  research  areas  still 
differ  quite  largely. 

The  effect  of  noise  on  estimation  results  are  given 
particular  attention. 

The  most  useful  econometric  method,  in  our  case,  is 
found  to  be  the  LISREL  model2.  However,  although  it  allows 
for  Maximum  Likelihood  estimation  when  both  equation  and 
measurement  errors  are  present,  this  model  does  not 
accomodate  dynamic  or  nonlinear  systems. 

The  most  promising  estimation  algorithm  is  found  in  the 
engineering  literature  and  is  known  as  the  "Filtering  form 
of  the  Maximum  Likelihood  algorithm".  The  detailed 
explanation  of  this  approach  is  delayed  until  section  IX. 

The  ident i f i abi 1 i ty  problem,  which  is  the  cause  of  much 
trouble  in  multivariate  system  estimation,  is  presented 
briefly  in  section  V.  Although  this  is  a  main  topic  of 
research  in  both  fields,  no  practical  way  of  checking  for 
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i dent i f i abi 1 i ty ,  other  than  an  analysis  of  singularity 
problems  is  usually  provided... 

In  a  second  part,  section  VI  describes  the  experimental 
framework;  section  VII  is  an  attempt  at  reproducing  Senge' s 
experiment  in  order  to  assertain  that  the  results  obtained 
are  comparable.  Section  VIII  describes  the  LISREL  model 
which  was  tried  on  experimental  smaller  order  dynamic  models 
and  then  on  the  Market  model;  but  the  results  were 
disappointing  and  are  not  presented  here.  In  section  IX,  the 
theory  of  the  Filtering  form  of  the  Maximum  Likelihood 
algorithm  is  presented,  along  with  a  simple  one  dimensional 
example.  The  equations  for  the  general  case  are  also  derived 
here . 

Section  X  describes  the  main  emphasis  of  this  thesis: 
the  development  of  a  computer  program  for  the  estimation  of 
parameters  using  the  Filtering  form  of  the  Maximum 
Likelihood  algorithm  and  the  experimental  results  obtained 
with  this  program  on  four  models  of  varying  complexity.  The 
performance  of  this  algorithm  is  found  to  be  very  acceptable 
for  small  models,  specially  in  the  light  of  the  OLS  results, 
despite  the  problems  in  choosing  values  for,  or  estimating 
the  various  noise  covariance  matrices.  It  does  however 
present  many  problems  if  used  on  a  larger  and  unstable  model 
such  as  the  Market -Growth  model. 

Finally,  section  XI  presents  the  conclusions  for  this 
set  of  experiments,  examines  the  applicability  of  such  a 
program  for  "Real-World"  problems,  and  suggests  an 
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alternative  approach  for  model  validation. 


II.  SYSTEM  IDENTIFICATION:  AN  OVERVIEW 

The  use  of  mathematical  models  to  describe  the  behavior  of  a 
physical  phenomenon  is  now  common  practice  in  a  great  number 
of  scientific  disciplines.  More  recently  however,  the 
systems  under  study  have  drastically  grown  in  complexity  and 
have  also  extended  to  social  and  economic  disciplines.  The 
need  for  control  engineers  to  have  accurate  Knowledge  of 
their  plants  and  for  socio-economic  mode  1 -bui Iders  to 
present  reliable  models  of  systems  to  decision-makers  has 
caused  an  increasing  interest  in  the  techniques  of 
mode  1 -bui ldi ng . 

If  all  the  phenomena  under  study  were  Known  with 
certitude,  and  if  exact  Knowledge  of  these  systems  was 
possible,  exact  calculation  of  model  parameters  would  also 
be  possible  and  we  would  have  deterministic  models. 

Unfortunately,  there  are  usually  many  unknown  factors 
(lack  of  exact  Knowledge  of  the  system,  or  even  if  this  is 
possible,  there  is  usually  measurement  noise  or 
disturbances,  etc...).  Being  incapable  of  writing  exact 
equations  or  to  predict  behavior  with  certitude,  we  have  to 
refer  to  probabilistic  methods.  Based  on  observations,  we 
have  to  build  physical  insight  into  the  problem  being 
studied  and  come  up  with  with  a  model ,  defined  in  Eykhoff 
[20]  as: 

"  a  representation  of  the  essential  aspects  of  an 
existing  system  (or  a  system  to  be  constructed) 
which  presents  Knowledge  of  that  system  in  a  usable 
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form " . 

The  revolutionary  work  of  Kalman  [21]  and  others  has 
led  to  modern  control  theory  in  which  the  models  used  are 
with  a  few  exceptions  parametric  models  in  terms  of  state 
equations.  The  desire  to  determine  such  models  from 
experimental  data  has  renewed  the  interest  of  control 
engineers  in  parameter  estimation  and  related  techniques. 

Independently,  econometricians  have  developed  a  number 
of  tools  for  their  econometric  model -bui ldi ng  (mainly 
regression  techniques)  and  created  a  wealth  of  statistical 
techniques  for  the  estimation  and  testing  of  their  models 
(see  Thei 1  [22], for  example). 

Our  knowledge  about  systems  is  often  in  terms  of 
time-series,  i.e.,  a  set  of  successive  values  for  t  =  1...T 
for  the  input  u(t)  and  the  measured  output  z(t). 

Some  "a  priori"  knowledge  about  the  system  and  even 
some  approximate  knowledge  of  the  values  of  the  parameters 
in  the  model  are  available  in  some  cases.  Our  main  concern 
here,  given  u(t)  and  z(t)  on  a  period  of  time,  is  to 
determine  the  (differential)  equations  that  describe  the 
system's  behavior. 

A  model  represents  three  types  of  knowledge: 

1 .  Structure 

2.  Parameters'  values 

3.  Values  of  dependent  variables 
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The  process  of  "System  Identification"3  is  then  an 
iterative  process  to  determine  these  three  types  of 
Knowledge  since  they  are  highly  interdependent,  and  has  been 
illustrated  by  Box  and  Jenkins  [23]  as  follows: 

1.  From  the  interaction  of  theory  and  practice,  a 
successful  class  of  models  for  the  purposes  at  hand  is 
cons i dered . 

2.  Because  this  class  is  too  extensive  to  be  conveniently 
fitted  directly  to  the  data,  rough  methods  for 
identifying  subclasses  of  these  models  are  developed. 
These  employ  data  and  Knowledge  of  the  system  to  suggest 
an  appropriate  class  of  "parsimonious"  models  which  may 
be  tentatively  entertained. 

3.  The  tentatively  entertained  model  is  fitted  to  data  and 
its  parameters  Estimated 

4.  -Diagnostic  checks  are  applied  with  the  object  of 

uncovering  possible  lack  of  fit  and  diagnosing  the 
cause . 

The  process  is  illustrated  in  Fig  1 1  -  1 . 

Although  this  is  intented  for  a  " Box- Jenkins"  type  of 
statistical  identification,  it  gives  an  idea  of  the  general 
identification  process. 

The  field  of  system  identification  and  parameter 
estimation  is  rather  complex.  This  complexity  is  due  to  the 
variety  of  applications,  in  fields  as  different  as 

3  The  term  "identification"  is  used  here  in  its  engineering 
sense.  The  econometric  equivalent  is  specification  and 
estimation",  the  term  "identification"  being  used 
exclusively  in  the  context  of  the  i dent i f i abi 1 i ty  problem. 
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FIGURE  II-1:  The  process 
( f rom[ 23 1 ) 


of  System  Identification 
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communications,  power,  mechanical,  aeronautical  and  chemical 
engineering,  physics,  geology,  economics,  biology  and 
medicine;  to  the  variety  of  goals,  whether  of  research, 
design  or  control;  to  the  variety  of  conditions:  what  Kinds 
of  signals  ape  available,  what  "a  priori"  Knowledge  about 
the  system  there  is,  etc... 

The  models  can  be  single  input-single  output  (SISO), 
multiple  input-multiple  output  ( M I  MO )  or  combinations 
thereof,  linear  or  nonlinear  (in  the  parameters  or  in  the 
variables),  continuous  or  discrete,...  An  excellent  survey 
of  all  these  models  and  approaches  is  given  by  EyKhoff  [20]. 
Each  has  been  extensively  studied,  and  different  approaches 
and  algorithms  developed  for  each  specific  case. 

The  two  forms  which  are  general  enough  to  include  the 
major  part  of  the  identification  practice  are  the 
engineering  " state- space"  model  and  the  econometric 
" simul taneous  equat  i ons "  mode  1 . 

The  econometric  and  engineering  prat  ices  of  system 
identification  have  developed  in  parallel  for  a  long  period 
of  time.  Although  in  general  they  use  well  Known  statistical 
techniques,  differences  in  the  vocabulary,  notation,  models 
considered,  availability  of  data,  purposes  of  identification 
and  the  emphasis  of  research  made  cross -communi cat  ion  very 
difficult  or  non-existent.  It  is  only  in  the  past  few  years 
that  some  attempts  have  been  made  at  bridging  this  gap.  In 
this  respect,  Mehra' s  outstanding  article  in  Annals  of 
Economic  and  Social  Measurement  [24]  is  worth  mentionning. 
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To  make  communcation  between  researchers  in  the  two  field 
easier,  he  proposes  the  state-space  model  of  contol  theory 
as  a  unifying  link  and  discusses  the  relationship  of  this 
model  to  the  "simultaneous  equation"  model  as  well  as  the 
use  of  process  and  measurement  noise  in  the  econometric 
context.  He  also  develops  an  algorithm  for  model  structure 
determination  and  estimation  of  parameters.  Mathematical 
economists  have  manifested  an  increasing  interest  in  control 
theory  and  its  applications  to  econometrics  (see  [25],  [26] 
for  example).  In  particular,  a  number  of  articles  examine 
problems  of  parameter  estimation  for  linear  and  nonlinear 
econometric  models  from  the  point  of  view  of  Kalman 
filtering.  ([24],  [27],  [28],  [29]). 

The  next  two  sections  attempt  to  review  briefly  the 
main  trends  in  both  fields  as  well  as  the  emphasis  of 
current  research. 
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III.  SOME  ECONOMETRIC  ESTIMATION  TECHNIQUES 

In  his  experiment,  Senge  [17]  considers  only  OLS  and  GLS 
which  are  estimation  techniques  derived  for  the  standard 
single-equation  linear  regression  model  and  for  the 
combination  of  several  linear  relations,  respectively. 

However,  most  econometric  models  attempt  to  model  the 
general  interdependence  of  economic  phenomena  and  lead  to 
systems  of  simultaneous  equations  where,  in  general,  each 
equation  has  several  dependent  variables  which  also  occur  in 
other  equations.  A  number  of  methods  have  been  developed  to 
deal  with  this  situation  (see  for  example  [ 22 ] , [ 30 ] , [ 3 1 ] ) , 
some  of  which  will  be  presented  here. 


A.  NOTATIONS  AND  ASSUMPTIONS 


The  basic  "simultaneous  equations"  econometric  model  is 
a  system  of  G  equations  in  G  endogenous  variables,  K 


predetermined  variables,  and 
written  in  implicit  form  as: 

Fj  (y,  •  •  •  -  Yg  >  xi  ’  •  •  •  xk  ’  ai 
where : 


parameters,  and  can  be 
.  .  . afi )  =  0  j=1...G  (1) 


v.  i=1...G  are  current  endogenous  variables:  they  are 

J  i 

determined  by  the  system. 

x  •  k=1  K  are  predetermined  variables,  they  contain: 
k 

-  current  and  lagged  exogenous  variables 

-  lagged  endogenous  variables, 
a,  :  1=1 .. .S  is  a  set  of  parameters 
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If  the  system  is  assumed  to  be  linear  (which  is  the 
case  of  most  econometric  models),  equation  (1)  becomes: 

;  •y;  +  Cj,-  »x;=0  j=  1  .  .  .  G  (2) 


If  we  have  T  obsevations  on  this  system,  introducing 
the  observation  index  t  =  1 . T  and  the  disturbance  terms4 


u,  ,  the  system  for  observation  t  is: 


21  bj''  *y.t +  ZL  Cj,  ,xit  =  u 

i  =  I 


j=  1  .  .  . G 


(3) 


or,  in  matrix  notation: 

B*yt  +  C»Xt  =  y  t;  t=1....T  (4) 

where : 

B  is  (GxG)  nonsingular 
C  i s  ( GxK ) 

yt  is  ( Kx  1  )  ,  x_t  is  and  yt  is 

Finally,  the  notation  for  the  entire  system  of  T 
observations  is: 

Y.B7  +  X»C;  =  U  (5) 

where : 

Y  =  (y ( . yT)'  is  (TxG)  : 

matrix  of  observations  on  the  jointly  dependent 
var i ab 1 es , 

X  =  (xf . xT)'  is  (TxK)  and  is  assumed  to  have  rank 

K: 


4The  term  "structural  disturbance"  or  "residual"  is 
introduced  in  econometric  equations  to  account  for  the 
neglected  determining  factors  explaining  the  left-hand 
variable.  The  assumptions  on  these  residuals  are  presented 

below 
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matrix  of  obsevations  on  the  predermi need  variables 

U  =  (u, . u  T ) '  is  (  TxG )  : 

matrix  of  structural  disturbances 
The  basic  assumptions  on  the  disturbance  terms  are: 

1.  Zero  mean:  E ( u t )  =  0  t=1...T 

2.  Unknown  but  finite  covariance  matrix:  E(ut,u't)  =  2T 

3.  No  autocorrel at  ion :  E(u.,u\  ..  )  =  0  i  f  s  *  0 

4.  No  correlation  with  predetermi ned  variables: 

E(u  ,x#fc)  =  0 

B.  SOME  ESTIMATION  TECHNIQUES 

B.1  Desirable  properties  for  an  estimator 
Theoretically,  our  knowledge  about  the  parameter  of  a 
model,  b,  whose  estimate  is  b  could  be  expressed  in  terms  of 
the  probability  density  function  p(b,T)  if  T  is  the  number 
of  data  points  available.  In  practice  however,  this  function 
is  reduced  to  its  most  significant  characteristics: 

-  "expected  value"  :  E ib) 

-  "bias"  :  E  (£>)-  b 

-  "covariance"  :  Cov(b)  =  E{(b-E(b) )»{b-Eib)  )'  } 

Some  desirable  properties  of  estimators  are  then: 

-  "Unbiasedness":  if  for  each  t:  E (b)  =  b 

-  "Consistency":  if  lim  P ( | b  -  b |  <£  )  =  1  or  plim  b  =  b 

t  — ¥  oO 

-  "Efficiency":  if  for  all  unbiased  estimates  c, 

Cov(b)  <  Cov(c) 
or  deticov  (c)  -  Cov(b)}  >  0 
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i.e.  b  has  "minimum  variance". 

If  the  first  and  third  properties  hold  only  for  t  -> 

,  they  are  "asymptotic". 

Different  methods  of  estimation  of  identified  equations 
(for  the  " i dent i f i abi 1 i ty"  problem,  see  section  V)  for 
simultaneous  equation  models  have  been  proposed  in  the 
literature.  These  can  be  classified  under  the  category  of 
single  equation  methods  (or  Limited  Information  methods)  and 
systems  methods  (or  Full  Information  methods). 

In  single  equation  methods,  we  estimate  each  equation 
separately  using  only  the  information  about  the  restrictions 
on  the  coefficients  of  that  particular  equation. 

In  systems  methods,  we  estimate  all  equations  jointly 
using  the  restrictions  on  the  parameters  of  all  equations  as 
well  as  the  covariances  of  the  residuals. 

B.2  Single  equation  methods 

The  most  commonly  used  single  equation  methods  are 
Ordinary  Least  Squares  (OLS),  Indirect  Least  Squares  (ILS), 
Two-Stage  Least  Squares  ( 2SLS ) ,  and  Limited  Information 
Maximum  L i ke 1 i hood ( L I  ML ) . 

For  the  purposes  of  single  equation  estimation,  the 
equation  for  fixed  j  =  J  <  G,  in  (3)  can  be  written,  after 
normalization,  as: 


L 


(6) 


where  y, 


are  the  other  endogenous  variables  included 


a 


' 


30 


in  equation  J, 

*,t . xu  are  the  predetermined  variables  included  in 

equation  J, 

and  the  bT|  and  c^.;  are  correspondi ng  coefficients5. 
Introducing  the  observations  for  t  =  1...T,  we  get: 

=  Yj  ®b j  +  Xj®c j  +  u  j  (7) 

where  Yj  (TxH)  and  Xj  (TxL)  are  respectively  matrices 
of  other  endogenous  and  predetermi ned  variables  included  in 
equation  J,  and  bj  ( Hx 1 )  and  c7  ( L x 1 )  the  cor respondi ng 
coefficient  vectors.  Yj  and  u J  are  (Txl). 

Defining  Zj  =  (  Yx  Xj.)  and  aT  =  (b'y  c'j.)'  ,  equation  (7) 
becomes : 

+  Uj  *8> 

Ordinary  Least  Squares  (OLS) 

The  OLS  estimate  of  the  parameter  set  a  is  given  by 
minimizing  the  criterion: 

S=(Y“Za)'»(Y'Za)  ( 9 ) 

Setting  dS/da  =  0  yields: 

(Z'  Z)«a  =  V  *Y 


5  They,  as  well  as  u  ,  are  different  from  those  used  in 
equation  (3).  They  are  now  of  a  different  sign  and  divided 
by  b  for  normalization 
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or  in  our  case: 

=  (Z*  y,  (10) 

This  estimate  is  biased  and  does  not  have  the  large 
sample  property  of  consistency  because  of  the  correlation  of 
the  residuals  with  the  yT  ,  which  violates  one  of  the  basic 
assumptions  of  OLS6. 

Consider  the  standard  linear  model: 

y  =  X»b  +  u  (11) 

X,  the  (TxK)  matrix  of  explanatory  variables,  is  for  this 
model  assumed  independent  from  u  ( T x 1 )  vector.  Then: 

b  =  (  X'  X  ) - 1  »X'  y  =  b  +  (  X'  X  )  ~ 1  «X#  u  (12) 

so  that: 

E  (b)  =  b  since  E(X'u)  =  0. 

In  the  case  of  equation  (10)  however,  since  the  Z7 
terms  contain  the  matrix  Yj  of  endogenous  variables  that  are 
not  independent  of  the  residuals,  the  estimate  az  will  be 
bi ased . 

Nonetheless,  OLS  is  still  used  because  it  was  found  to 
be  more  robust  against  specification  errors  than  some  other 
methods . 


6  The  OLS  assumptions  are  that  the  residuals  satisfy  : 

1.  Zero  mean  E(ut)  =  0  for  all  t. 

2.  Homoscedast ici ty  or  same  variance  VluJ  =  s2  for  all  t. 

3.  Serial  independence  :  ut  and  ut  are  independent  for  all 
t  *  t' 

4.  uk  are  normally  distributed. 

5.  xt  are  independent  of  ut'  for  all  t  and  t' .  E(ufext/)  =  0 
for  all  t  and  ty  . 

6.  The  elements  of  Z  are  linearly  independent.  Hence 
( V  1 ) " 1  exists. 
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Indirect  Least  Squares  (ILS) 

In  ILS  estimation,  the  reduced  form  equations  (obtained 
from  equation  ( 5 ) ) , 

Y  =  X®P'  +  V  (13) 

where  P'  =  -  C'®(B'  )_1  ( KxG ) 

and  V  =  U®  ( B'  )  ‘ 1  (  TxG  ) 

are  estimated  by  OLS,  and  lead  to  the  structural  parameters. 
If  the  equation  is  over i dent i f i ed  (see  section  V),  this 
leads  to  multiple  solutions.  ILS  is  therefore  not  much 
discussed . 

Two-Stage  Least  Squares  (2SLS) 

In  2SLS,  we  first  estimate  the  reduced  form  equations: 

Y.,  =  X®  P^.  +  Vj  (14) 

where  Pj  (KxH)  and  VT  (TxH)  are  derived  from  equation  (13) 
and  the  definition  of  Yj  ( equ .  (7))  by  a  suitable 
partitioning,  to  obtain  the  estimate  of  P^  ,  denoted  .  Let 
us  now  define: 

y j  =  x®p;  (is) 

These  values  are  not  correlated  with  the  disturbance  terms. 
Using  a  basic  Least  Squares  relationship  we  can  write: 

Y,  =  Yj  +  (16) 

where  Vj  is  the  (TxH)  matrix  of  estimated  residuals. 

The  idea  then  is  to  replace  Yj  in  equation  (7)  by  Yj 


thus  yielding: 
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=  Yj  *bj  +  Xj.Cj  +  ux  +  VT  mbr  (17) 

Now  defining  Zs  -  { Yr  X  x )  and  wf  =  Uj.  +  Vs  »bj,  we  obtain  a 
new  form  of  our  equation: 

'  ^*?i  +  «,  (  18) 

The  new  Least  Squares  estimate  is  now  defined  by: 

a  =  (Zj  •ZJ)~1»Z'r  •Yj  (19) 

This  estimate  is  consistent,  has  the  asymptotic 
property  of  efficiency  and  is  the  most  used  in  econometric 
practice. 

Limited  Information  Maximum  Likelihood  (LIML) 

LIML  is  obtained  by  maximizing  a  rather  complex  and 
nonlinear  Likelihood  function  and  is  shown  (under  the 
assumption  of  normality  of  the  disturbance  terms)  to  be 
consistent  and  asymptotically  as  efficient  as  2SLS.  Because 
of  the  computational  burden  2SLS  is  usually  prefered  to  LIML 
in  econometric  practice. 

The  method  is  briefly  outlined  in  section  (IV-B.2)  in  the 
context  of  the  engineering  literature  and  developed  in  more 
detail  in  section  IX. 

B.3  Systems  methods 

The  two  principal  systems  methods,  Ful 1  -  Informat  ion 
Maximum  Likelihood  (FIML),  and  Three-Stage  Least  Squares 
(3SLS)  are  generalizations  of  LIML  and  2SLS  and  are  shown  to 
be  consistent  and  equally  efficient  asymptotically. 
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Because  they  use  more  information,  at  each  stage  than 
the  single  equation  methods  they  are  asymptotically  more 
efficient  than  the  latter.  Just  as  LIML,  FIML  entails 
computational  problems. 

NOTE 

In  addition  to  these  methods,  one  can  also  consider 
what  is  Known  as  the  Instrumental  Variables  (IV)  method, 
where  a  set  of  instrumental  variables  satisfying  certain 
conditions7  [32]  is  selected  and  the  estimator  computed 
as : 

2riy  =  (W'x  mZj)-'  W'T  .yr  (20) 

where  Wj  is  a  set  of  instruments, 
for  a  single  equation,  or  as: 

a  =  ( W»Z ) “ 1 W ®y  (21) 

\v 

where  W  is  a  set  of  instruments,  for  the  whole 
system. 

It  can  be  shown,  [32],  that  ILS,  2SLS,  3SLS  are 
special  cases  of  IV  estimation.  Necessary  and  sufficient 
conditions  for  an  IV  estimator  to  be  efficient  are  also 
der i ved  in  [ 32 ] . 

C.  DEPARTURES  AND  CURRENT  RESEARCH  AREAS 

Departures  from  the  assumptions  of  the  linear 
simultaneous  equations  model  and  of  the  standard  linear 
model  are  treated  extensively  in  the  econometric  literature 
and  constitute  the  bulk  of  current  research. 

7  In  particular  that  they  must  not  be  correlated  with  the 
disturbance  terms 


« 
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Some  of  the  topics  of  interest  are  presented  below. 

C.1  Mu  1 1 icol 1  inear i ty 

The  OLS  estimate  does  not  exist  when,  contrary  to  the 
assumptions,  the  matrix  X  of  explanatory  variables8  has  rank 
less  than  K,  that  is,  if  some  of  the  explanatory  variables 
are  linearly  dependent  or  nearly  dependent  so  that  it 
becomes  difficult  to  disentangle  their  separate  effects  on 
the  dependent  variable.  (This  latter  case  results  in  large 
standard  errors  for  the  coefficients) 

The  problem  with  mul t icol 1  inear i ty  is  not  one  of 
existence  or  not,  but  of  how  serious  the  problem  is. 

There  are  only  some  rough  rules  of  thumb  by  which  we  can 
decide  whether  mu  1 1 i col  1 i near i ty  is  serious  or  not.  (They 
mainly  involve  comparisons  of  the  multiple  correlation 
coefficients  of  the  explanatory  variables,  the  coefficient 
of  multiple  determination9  R2  and  the  partial  r2  ) 

Some  suggested  solutions  are  :  dropping  variables, 
using  extraneous  estimates,  using  ratios  or  first 
differences,  principal  components,  etc...  (see  [33], [22]  for 
more  detai 1 ) 

C.2  Autocorrelat ion 

When  the  residuals  are  correlated  over  time  in 
time-series  it  is  often  a  sign  of  omitted  variables  (which 
are  themselves  serially  correlated).  It  is  then  usually 
assumed  that  the  residuals  form  an  autoregressi ve  process 
(AR),  a  moving  average  process  (MA)  or  a  mixed 

8  e.g.  in  the  case  of  the  standard  linear  model:  y  =  X»b  +  u 

9  See  p.  84 
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autoregressive  movi ng- aver age  process  (ARMA)  and  they  are 
dea It  with  as  such . 

The  problem  of  autocorrelation  is  one  of  the  most 
studied  (see  for  example  [34],  [35],  [36],  [37],  [38])  but 
we  shall  limit  ourselves  here  to  the  examination  of  the 
effects  of  measurement  noise,  bearing  in  mind  that 
autocorrelation  does  affect  the  estimates10. 

C.3  Time-varying  parameters 

The  recent  interest  of  econometricians  in  the 
applications  of  the  Kalman  Filter11  has  contributed  to  a 
great  increase  in  articles  dealing  with  time-varying 
parameters,  and  strucural  changes  in  the  course  of 
estimation.  This  field  is  one  of  the  first  examples  of 
cross- fer t i 1 i zat ion  between  the  two  disciplines. 

Some  papers  of  interest  are  [24],  [27],  [28],  [29]. 

C.4  Errors  in  the  variables 

This  topic  will  be  dealt  with  in  more  detail  here  since 
errors  in  the  variables  have  such  a  drastic  effect  on 
estimation  as  reported  by  Senge  [17]  and  therefore 
constitute  our  main  subject  of  investigation. 

The  standard  linear  model  in  econometrics  assumes  that 
the  neglected  determining  factors  are  the  only  cause  of  the 
fact  that  the  explanatory  variables  do  not  fully  account  for 
the  behavior  of  the  left-hand  variable.  This  is  also  an 
assumption  of  all  the  extensions  of  this  model,  in 


10See  Senge  [17]  for  further  elaboration  on  autocorre 1  at  ion 
in  the  Market -Growth  model 
11  See  section  ( I X - E ) 
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particular  the  simultaneous  equations  model. 

The  assumption  is  rarely  justified  since  most  econometric 
data  contain  observational  errors  and  the  variables  that  are 
actually  measured  are  often  proxies  to  what  was  really 
i ntended . 

The  problem 

The  effect  of  measurement  noise  on  the  OLS  estimates  is 
illustrated  by  the  following  example  from  [33]. 

Suppose  that  the  model  is  : 

y  =  bx  +  e  (22) 

Instead  of  y  and  x,  we  measure  : 

y'  =  y  +  v 


/  - 


=  X  +  u 


(23) 


where  u  and  v  are  measurement  errors  with  zero  means  and 
variances  s2  and  s2,  respectively,  and  are  both  mutually 
uncorrelated  and  uncorrelated  with  the  systematic  parts. 
That  is  : 

E(u)  =  E ( v )  =  0 
Var(u)  =  s,2 
Var(v)  =  s2 

Cov(u,v)  =Cov(u,x)  =Cov(y,v)  =Cov(x,v)  =Cov(u,y)  =  0 
Equation  (22)  can  be  written  in  terms  of  the  observed 
variables  as: 

y’  -  v  =  b(x'  -  u )  +  e 
y'  =  bx7  +  w  (24) 

where:  w  =  e  +  v  -  bu .  (25) 

The  reason  we  cannot  apply  OLS  is  that  Cov(w,x' )  i  0.  In 


< 
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fact , 

Cov(w ,x'  )  =  Cov(-bu,x  +  u)  =  -bs£_  (26) 

Thus  one  of  the  basic  assumptions  of  OLS  is  violated12. 
If  only  y  is  measured  with  error  and  x  is  measured 
without  error ,  there  is  no  problem  because  Cov(w,x' )  =  0 
in  that  case.  Thus,  given  the  specification  (22)  of  the 
true  relationship,  it  is  errors  in  x  that  cause  the 
problem. 

If  we  estimate  b  by  OLS  applied  to  (24),  we  have  : 

r  r  T  T 

b  =  zLx'y'/^x/2  =Zl(x  +  u )  ( y  +v)/2jx  +  u)2  (27) 

Lai  La  I  t- '  til 

and : 

plim  b  =  Cov(x,y)/  (Var(x)  +  Var(u)) 

=  s*y  /(s2  +  s2)  (28) 

as  all  cross  products  vanish.  Since  b  =  s^/s2,  we  have: 

plim  b  -  b/  ( 1  +  s^/s2)  (29) 

Thus,  b  will  underestimate  b.  The  degree  of 
underestimation  depends  on  s^/s2. 

If  there  is  more  than  one  explanatory  variable,  and 
if  all  variables  are  measured  with  error,  the 
expressions  of  the  bias  get  very  complicated,  and  both 
underestimation  and  overest imat ion  are  possible.  [33] 
Attempts  to  solve  the  problem 

Thei 1  [22]  states  that  : 


1 2  See  page  3 1 
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Under  the  condition  that  the  u' s  and  the  v' s 
are  normally  distributed,  one  can  apply  the 
Maximum  Likelihood  method  provided  that  the 
ratio  s2/s2  of  the  error  variances  is  known,  but 
this  provision  is  not  usually  fulfilled;  and 
even  if  it  is,  the  resulting  estimators  of  b,  s^ 
and  s2  are  not  all  three  consistent..."  [22,  p. 

609]  . 

Other  methods  have  also  been  attempted,  none  of 
which  is  really  simple  in  application  and  some  involve 
an  "enlightened  guess"  of  the  error  variance  based  on  an 
evaluation  of  the  quality  of  the  data  !... 

A  more  recent  approach  [39]  shows  that 
econometricians  have  only  recently  started  to  examine 
the  problem  of  measurement  errors  in  Structural  Equation 
Systems  1 3 

Here  again,  the  mi sest imat ion  caused  by  errors  in 
the  variables  specially  when  autocor re  1  at i on  or 
mu  1 1 i co 1 1 i near i ty  are  also  present,  and  the 
identification  problems  encountered  [40],  make  the  issue 
a  very  difficult  one  to  resolve. 

One  of  the  most  promising  algorithms,  the  LISREL 
algorithm  (for"LInear  StructuRal  Estimation  by  maximum 


13In  a  structural  equation  model  each  equation  represents  a 
causal  link  rather  than  a  mere  empirical  association.  In  a 
regression  model,  on  the  other  hand,  each  equation 
represents  the  conditional  mean  of  a  dependent  variable  as  a 
function  of  explanatory  variables.  It  is  this  distinction 
that  makes  conventional  regression  analysis  an  inadequate 
tool  for  estimaing  structural  equation  models  .[39] 
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Likelihood")  is  proposed  by  Joreskog  [41],  [42].  It  is  a 
Maximum  Likelihood  procedure  based  on  the  analysis  of 
the  covariance  structures  of  structural  equation  models 
and  allows  for  both  errors  in  equations  and  errors  in 
variables,  but  it  is  not  Maximum  Likelihood  (only 
consistent)  if  used  in  systems  with  lagged  endogenous 
var i ables . 

A  more  detailed  description  of  this  method  is  given 
i n  section  ( 1 1 1  - D ) . 

C.5  Small  sample  properties 

All  the  properties  established  theoretically  for 
estimators  and  described  above  are  asymptotical  or  "large 
sample"  properties.  However  for  the  practical  choice  of  an 
estimation  method  in  an  econometric  model,  typically  based 
on  small  samples  of  data  (seldom  exceeding  80  observations 
and  frequently  much  smaller),  the  small  sample  properties  of 
the  various  estimators  are  of  crucial  importance. 

For  reasons  of  mathematical  intractability,  relatively 
little  is  known  about  finite  sample  distributions  of  the 
various  estimators.  The  exact  finite  sample  distributions  of 
L imi ted- Informat  ion  Maximum  Likelihood  and  Two-Stage  Least 
Squares  have  been  derived  by  Basmann  and  Nagar  in  certain 
special  cases  [43],  [44],  [45]. 

Nonetheless,  some  tentative  results  can  be  obtained  in 
terms  of  Monte-Carlo  methods  (described  here  as  in  [31]) 
Monte  Carlo  methods  are  similar  to  the  experimental 
procedure  used  by  Senge  (see  section  (VI-B.1)),  except  that 
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for  a  specified  structure,  repeated  samples  of  disturbances 
are  drawn  by  using  appropriate  tables  of  random  numbers.  The 
exogenous  variables  are  combined  with  the  disturbance  values 
for  each  sample  (usually  of  size  15  to  40)  to  generate 
values  of  the  endogenous  variables.  The  estimating  methods 
under  study  are  then  applied  to  each  sample  set.  This 
process  is  repeated  a  large  number  of  times  (typically  50, 
100  or  200)  and  the  resultant  frequency  distributions  of 
estimates  are  studied  in  conjunction  with  the  true  values  of 
the  parameters  in  order  to  conjecture  the  small -sample 
properties  of  the  estimators. 

In  studying  individual  parameters,  three  important 
criteria  are  usually  distinguished.  These  are  bias  (B), 
variance  (V),  and  mean  square  error  (MSE)  and  are  defined 
as : 

B  =  b  -  b  (30) 

where  :  b  is  the  mean  of  the  sampling  distribution  of 
estimates  and  b  is  the  true  value  of  the  parameter. 

v 

V  =  1/N  21  (£>i  '  b)2  131  1 

/'-  I 

s/ 

MSE  =  1/N  1EL  (  bi  -  b2 )  =  V  +  B2  (32) 

i=> 

The  last  formula  shows  that  a  biased  estimate  may  show 
a  smaller  MSE  than  an  unbiased  one  if  it  more  than 
compensates  for  its  bias  by  having  a  smaller  variance. 

A  number  of  we  1 1 -des i gned  Monte-Carlo  experiments  are 
to  be  found  in  the  econometric  literature  (see  [46],  [47], 
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[48],  [49]),  involving  primarily  LIML ,  2SLS ,  OLS,  and  F I  ML . 
The  general  conclusions  emerging  from  them  are  well 
summarized  by  Johnston  in  [31]. 

Some  more  recent  results  can  be  found  in  [50],  [51], 

[52]  . 

Although  these  sampling  experiments  have  yielded 
some  results  of  broad  usefulness,  they  have  not  been 
truly  satisfactory,  if  for  no  other  reason  than  the 
pervasive  suspicion  that  the  results  are  peculiar  to 
the  specification  of  the  models  and  the  structures 
used  in  the  experiments  ..."  [53] 

All  these  studies  have  assumed  ideal  conditions 
and  when  they  have  considered  any  departures,  most 
of  them  have  limited  themselves  to  a  single  nonideal 
si tuat ion . 

In  view  of  the  great  disparity  and  absence  of 
unity  in  the  above  studies,  and  of  the  nature  of  the 
models  used  in  Monte-Carlo  experiments,  no  "a 
priori"  insights  can  be  drawn  for  the  estimation  of 
system  dynamics  feedback  models. . . 

C . 6  Estimation  under  Nonlinear  Specification 

Nonlinearity,  in  the  econometric  context  described 
below,  is  a  nonlinearity  in  the  parameters  and  not  in  the 
variables,  since  nonlinearity  in  the  variables,  in  general 
squares  or  logarithms  and  exponentials,  is  treated  in 
econometrics  as  a  linear  problem. 

Several  extensions  of  the  basic  estimators  such  as 
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Nonlinear  Ordinary  Least  Squares  and  Full  Information 
Maximum  Likelihood  estimators  [54],  the  Nonlinear  Two-Stage 
Least  Squares  estimator  [55],  the  Nonlinear  Limited 
Information  Maximum  Likelihood  and  the  Modified  Nonlinear 
Two-Stage  Least  Squares  estimators  [56],  and  the  Nonlinear 
Three-Stage  Least  Squares  est imator [ 57 ]  have  been  developed 
recently. (  see  also  [58]  , [ 59 ] , [ 60 ] , [ 6 1 ] . ) 

They  generally  lead  to  estimators  possessing  all  the 
desirable  properties  but  they  typically  entail  computational 
problems.  Efficient  computational  methods  for  these 
estimators  are  very  recent  or  still  under  investigation 
[62] , [63] . 

D.  THE  LISREL  MODEL 

The  following  is  a  description  of  the  LISREL  computer 
program  [64]  developed  by  doreskog  and  Sorbom  and  based  on 
Joreskog' s  general  method  for  analysis  of  covariance 
structures  [41],  [42]. 

The  LISREL  model  allows  for  errors  in  both  the 
equations  and  measurement  errors,  and  yields  estimates  of 
the  disturbance  covariance  matrix  and  the  measurement  error 
covariance  matrix  as  well  as  estimates  of  the  unknown 
coefficients  in  a  structural  equations  system,  provided  that 
all  these  parameters  are  identifiable14. 

If  there  are  no  errors  of  measurement,  the  method  is 


14  5ee  section  V  for  the  ident i f i abi 1 i ty  problem 
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equivalent  to  the  Full  Information  Maximum  Likelihood  (FIML) 
method15  provided  that  no  constraints  are  imposed  on  the 
disturbance  and  independent  variables  covariance  matrices. 

The  LISREL  model  consists  of  two  parts:  the  measurement 
model  and  the  structural  equation  model  and  is  formalized  a 
fol lows . 

The  true  model  is  given  by  the  set  of  structural 
equat i ons : 


where , 

B  is  (mxm) 
C  is  (mxn) 
y  is  ( mx 1 ) 
x  is  ( nx 1 ) 
u  is  ( mx 1 ) 


B*y  =  Ox  +  u 

coefficient  matrix 
coefficient  matrix 
vector  of  dependent  variables 
vector  of  independent  variables 
vector  of  random  residuals 


(33) 


It  is  assumed  that  x,  y  and  u  have  zero  mean,  that  u  is 
uncorrelated  with  x  and  that  B  is  nonsingular. 

Instead  of  x  and  y,  we  observe  X  and  Y  given  by  the 
mesurement  model : 


I  =  Hy»y  +  v  ( 34 ) 

X  =  Hx»x  +  w  (35) 


where  : 

-  Y_  (pxl)  and  X  (qxl)  are  vectors  of  obseved 
var i ables 1 6 , 

-  v  and  w  are  vectors  of  measurement  errors  in  y  and  x 
respectively;  they  are  assumed  uncorrelated  with  the 


1 5  See  p .  33 

1 6y  and  x  are  then  referred  to  as  "latent  variables" 
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latent  variables, 

Hy  (pxm)  and  Hx  (qxn)  are  regression  matrices  of  _Y  and 
X  on  y  and  x  respectively. 

If  in  addition  the  following  matrices  are  defined  : 
(J)  (nxn)  :  covariance  matrix  of  x 
Ip  (mxm)  :  covariance  matrix  of  u 
8v  (pxp)  :  covariance  matrix  of  the  noise  v 
8w  (qxq)  :  covariance  matrix  of  the  noise  w 
it  can  be  shown  [41]  that  the  covariance  matrix  XT  of 
z  =  (X'  X'  )'  is  a  function  of  Hy,  Hx,  B,  C,  (p ,  Ip,  8u  and  8w. 

The  program  is  based  on  the  analysis  of  the  usual 
sample  covariance  matrix  S  : 

v_ 

s  =  1/ ( N  -  1)  2L.  Ozi  -  2 )  (zi  -  z  ) '  }  (36) 

i  r  i 

The  estimation  problem  is  then  essentially  that  of  fitting 
the  2Z  imposed  by  the  model  to  the  sample  covariance  matrix 
S.  This  is  done  using  the  Likelihood  function.  The  logarithm 
of  the  Likelihood  fuction,  ommiting  a  function  of  the 
obsevations  can  be  written  as  : 

Log  L  =  -  (N  -  0/2  { Log |  S  |  +  tr(S.  S  '  O  }  (37) 

This  function  is  to  be  maximized  with  respect  to  0,  the 
vector  of  independent  unknown  parameters  in  B,  C,  (p ,  Ip,  8u, 
8w . 

The  minimization  is  done  by  means  of  a  modification  of 
the  iterative  method  of  Fletcher  and  Powell  [65]  described 
by  Gruvaeus  and  Joreskog  [66].  The  minimization  method  makes 


, 
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use  of  the  first  derivatives  and  approximations  to  the 
second  order  derivatives  of  F17  and  converges  rapidly  from 
an  arbitrary  starting  point  to  a  local  minimum  of  F.  If 
there  are  several  minima  of  F,  there  is  no  guarantee  that 
the  method  will  converge  to  the  absolute  mininum.  At  the 
minimum  of  F,  the  Information  matrix18  may  be  computed  and 
used  to  compute  the  standard  deviations  for  all  estimated 
parameters . 

The  package  also  allows  for  a  test  of  the  goodness  of 
fit  of  the  model  which,  together  with  an  inspection  of  the 
first  derivatives  of  F  can  suggest  improvements  in  the  model 
specification  (see  [64]  for  more  details). 

The  assumption  of  independent  obervations  however, 
rules  out  dynamic  models  which  include  lagged  endogenous 
variables  observed  with  mesurement  errors.  In  that  case,  the 
LISREL  method  is  consistent,  but  not  Maximum  Likelihood 
[41]. 


17  F  =  1/2  { log  |  ZT  |  +  t  r  (  S  •  2EI  ~ 1  ) }  is  the  actually 
minimized  function 
18 See  section  (IV-B.2) 
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IV.  THE  ENGINEERING  LITERATURE 

The  problem  of  identification  and  system  parameter 
estimation  has  received  extensive  attention  from  the 
engineering  community  as  well,  specially  in  the  last  two 
decades,  following  the  development  of  modern  control  theory 
and  the  pioneering  work  of  Kalman. 

In  an  excellent  article,  Astrom  and  Eykhoff  [67]  survey 
the  state  of  the  art  citing  over  230  references  dealing  with 
this  problem.  There  is  also  an  increasing  amount  of 
literature  which  deals  with  the  problems  of  stochastic 
control  in  general  (see  for  example  [68],  [69]) 

A.  NOTATION 

By  and  large,  the  engineering  estimation  techniques, 
Least  Squares,  Maximum  Likelihood,  Instrumental  variables, 
etc..,  are  very  similar  if  not  identical,  to  some  of  the 
econometric  methods  outlined  in  section  III,  although  using 
different  notations  and  terminology. 

The  following  presentation  of  some  main  features  of  the 
engineering  notation  and  methods  is  based  on  Eykhoff  [20]. 

If  the  process  output  is  given  by: 

z ( t ) =  F(u,b,n)  (  1  ) 

and  the  output  of  a  model  by: 

zm{  t )  =  F (u ,b, 0 )  (2) 
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where : 

b  is  the  vector  of  process  parameters 
u  is  the  processs  and  model  input 
n  is  the  noise 

b  is  the  vector  of  parameter  estimates, 
the  estimation  problem  may  be  formulated  as  a  minimization 
of  ( see  Eykhof f ) : 

■  the  functional  )  of  a  measurable  quantity: 
i  .  e  .  :  G{z  ( t ,  b )  -  zm(  t  ,b) } 

-  the  fuction(al)  of  an  unmeasurable  quantity: 
i.e.:  E{G(b-b)}  E:  expectation 

In  the  control  literature,  a  number  of  different 
estimation  procedures  have  been  developed.  They  differ 
predominantly  in  the  criterion  used  for  defining  optimality 
and  in  the  use  of  available  "a  priori"  knowledge.  Evidently, 
there  is  a  close  relationship  between  the  amount  of 
knowledge  available  and  the  type  of  "optimum"  procedure  that 
can  be  used  to  estimate  the  parameters. 

The  characteristics  of  the  Least  Squares,  Markov  or 
Generalized  Least  Squares,  Maximum  Likelihood  and  Bayes 
estimators  will  be  considerd  here.  The  amount  of  assumed 
initial  knowledge  increases  in  this  order .  Starting  from  the 
Bayes  estimator,  one  can  derive  the  other  estimators  as  a 
special  case  if  there  is  less  "a  priori"  knowledge  available 
and  under  certain  conditions  of  normality. 

The  problem  can  be  stated  as  follows  (see  Figure  I V - 1  )  : 

We  want  to  derive  an  estimator  (relationship): 
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FIGURE  I  V- 1  : 
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Engineering  formulation  of  the  Identification  problem 
( f  rom  [ 20 ] ) 
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&  =  6{u(  1  ) . u(T)  ,z(  1  ) . z(T  ) }  =  b{  u'  ,z'  }  (3) 

such  that  a  numerical  value  b  (estimate)  can  be  assigned  for 

b. 

B.  SOME  ESTIMATION  TECHNIQUES 

Some  types  of  estimators  are  then: 

B . 1  Bayes  Estimator 

As  a  starting  point,  let  us  choose  a  situation  where 
much  "a  priori"  Knowledge  is  available: 

1.  The  probability  density  function  of  the  noise  n;  from 
this  p(z)  follows,  and  since  it  is  dependent  on  b,  it 
will  be  noted  p ( z/b ) . 

2.  The  probability  density  function  of  the  parameter  values 
b  :  q  ( b  ) 

3.  The  cost  of  choosing  b  for  the  estimate  if  the  true 
value  of  the  process  is  b:  i.e.  a  "loss"  or  "cost" 

f unct ion  C ( b , b ) . 

Then,  from  Bayes'  rule: 

p(z/b)»q(b)  =  p(z,b)  =  p(b/z)»p(z)  (4) 

we  get 

p(b/z)  =  p(z,b)/p(z)  =  p(z/b)»q(b)/p(z)  (5) 

where 

p(y)=j^ . ^  p(z,b  )  •d*141  b  (6) 

m  +  i 

p ( b/z )  is  the  "a  posteriori"  probability  density  function  of 
the  parameters  b,  given  the  results  of  the  measurements  on 
z,  say  z  =  c. 
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Then  the  conditional  risk  of  choosing  b( z)  if  the  parameter 
is  b  can  be  written  as  the  expectation  of  the  cost  function 
with  respect  to  the  observations  z. 

The  average  risk  is  the  expectation  with  respect  to  the 
probability  of  the  value  of  b: 

R(b)  =  {E2/JC(b,b] }  (7) 


The  estimate  that  minimizes  this  expression  is  called 
the"  Minimum  Risk"  or  "Bayes"  estimate. 

We  have  from  Bayes'  rule: 

c  c  r  K  f  f  f  jh  4 1 

R{b)  =  d  z»p(z)  C(b,b)»p(b/z)*d  b  ( 8 ) 

k  JJ  J 


m*  i 


min  R  (b)  =  min  . J*  C  ( b,  b  )  »p  ( b/c  )  »d  b 


(9) 


m-t- 1 


since  P(z)  >  0  for  z  =  c  observed  sample. 

Then,  the  necessary  condition  is: 

d/db  (Jj  C  (b,b )  »p(b/c  )  •drt1+’'  b)  =  0  at  b  =  b  (10) 

m  t- 1 


B.2  Maximum  Likelihood  Estimator  (MLE) 

We  now  drop  assumptions  2  and  3.  The  only  "a  priori" 
knowledge  left  is  p(z/b) . 

Before  measurement,  we  know  that  the  samples  of  z  are 
random  variables  and  have  the  joint  probability  function: 

p{z ( 1 ) , . . . . z ( T ) /b}  (11) 

After  measurement,  we  know  the  observed  sample  values: 

z(  1  )  =c  , . z(  T  )  =cr  (  12  ) 

We  write  the  dependence  between  c;  and  b  as: 


.  .  - 
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L  {c, . cT  |  b } 

However,  this  can  also  be  considered  as  a  function  of  b 
given  the  sample  and  is  Known  as  the  Likelihood  function. 

The  philosophy  of  the  Maximum  Likelihood  algorithm  is 
developed  in  more  detail  in  section  IX. 

For  the  sake  of  convenience,  Log  L  is  usually  considered. 
Since  the  logarithm  is  a  monotone  function,  the  maximum  of  L 
with  respect  to  b  can  be  obtained  by  solving: 

6/db  Log  {L(c;b)}=0atb=£  (13) 

Some  of  the  properties  of  the  ML  estimator  are: 

-  consistency 

-  invariance:  if  b  is  the  MLE  for  b,  gib)  is  the  MLE  for 
g(b)  . 

-  asymptotic  unbiasedness:  E  (b)  =  b  for  t-->co  . 

-  asymptotic  normality:  p(b,b)  approaches  a  normal 
distribution  for  t-+oo  . 

-  asymptotic  efficiency:  approaches  the  best  accuracy  or 
minimum  variance  as  given  by  the  Cramer-Rao  inequality. 
This  inequality  states  (only  the  form  for  an  unbiased 
estimator  is  given  here)  that  for  any  estimator  b  of  b: 

Var  (b)  >  -1/  E {  d2  Log  L/  db2}SP>(14) 
This  called  the  "Minimum  Variance  Bound".  Or,  for  the 

multidimensional  case: 

Cov .(b)  >  J-1  (15) 

where : 

J  =  E {  d2  Log  L/  db»db'  }  (16) 

J  is  called  the  "Fisher  Information  Matrix"  and  puts  a 


bound  on  the  maximum  achievable  accuracy.  A  detailed 
discussion  of  this  topic  is  provided  by  Dhrymes  [30]. 
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In  view  of  all  these  properties,  the  use  of  ML 
estimates  is  highly  desirable. 

B.3  Markov  and  Least  Squares  Estimates 

These  estimators  belong  to  the  class  of  linear  unbiased 
estimators,  i.e.  those  that  satisfy  the  following 
relat ionships : 

,  b  =  Q«z  (17) 

Lib)  =  b  (18) 

Given  the  process: 

z  =  U»b  +  n  (19) 

and  the  model : 

z  =  U mb  (20) 

where  U  is  the  matrix  of  observed  inputs  and  n  is  the  noise, 
define  the  error  as: 

e  =  z  -  zm  =  U»(b  -  b)  +  n  (21) 

and  the  error  criterion  to  minimize  ,  as  the  quadratic  form: 

E  =  e' Re  (22  ) 

where  R  is  a  matrix  of  weighting  coefficients. 

Then  the  minimum  is  given  by: 

dL/  db  =  0  at  b  =  b  (23) 

b  =  ( IT  RU)"1*!^  Rz  (24) 

This  is  known  as  the  general  "Weighted  Least  Squares" 
estimate. 

If  R  is  chosen  as  N~ 1  where  N  =  E(nrV  )  is  the 
covariance  matrix  of  the  noise  process  (assumed  known) , 
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then : 

b  =  (U#  NMU)*  i»U#  N"  'z  (25) 

is  Known  as  the  Markov  or  Generalized  Least  Squares 
estimate. 

I  f  we  set  R  =  I  (we  do  not  Know  N) ,  then  we  get  the  well 
Known  Least  Squares  estimate: 

b  =  (U7  U  )  "  1  •U'  z  (26) 

These  estimates  are  unbiased  by  construction, 
consistent  under  certain  regularity  conditions19,  and  have 
the  property  of  minimum  variance:  i.e.,  for  any  other  c 

Cov(c)  >  Co v(b)  ( 27 ) 

If  the  noise  has  a  normal  distribution,  these 
estimators  can  be  derived  from  the  Maximum  Likelihood 
estimator.  In  our  continuous  relaxation  of  "a  priori" 
Knowledge,  let  us  now  assume  that  the  only  Knowledge 
available  is  that  of  N  =  E(nn7 ). 

Then : 

p(n)  =  (1/(2ti)^  •  |  N  |  V*  )»exp.  {  (  -  1  /  2  )  •  (  n '  N  ~  1  n  )  }  (28) 

where  E(n)  =  0  and  E(nrV  )  =  N. 

We  can  now  write: 

Log  p(z  -  Ub)  =  -(1/2)»Log  (  (  2 ti )  •  |  N  |  ) 

-  (  1/2M  (z  -  Ub)7  •N- 1  •  ( z  -  Ub)  ]  (29) 

Maximizing  this  likelihood  function  leads  to: 

d  tdb  {(z-  Ub ) 7  »N" 1 • (z  -  Ub)}  =  0  at  b  -b  (30) 

or : 

b  -  (  U7  N" 1  U  ) ' 1  »U7  N'  ''z  (  31  ) 
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See  p.  31 
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which  is  the  Markov  estimate. 

If  in  addition  we  relax  Knowledge  of  N,  i.e.  the  "a 
priori"  knowledge  is  nil,  we  obtain: 

b  =  (U#  U  )  ” 1  •U/  z  (32) 

the  Ordinary  Least  Squares  (OLS)  estimator. 

Thus,  the  OLS  and  the  GLS  (or  Markov)  estimators  can  be 
derived  from  the  MLE  under  the  assumption  of  guaussian 
noise.  Under  these  conditions,  all  these  estimators  yield 
the  same  estimates  and  attain  Minimum  Variance  Bound. 

The  presentation  of  estimation  techniques  is  limited  to 
this  rough  outline  of  the  main  algorithms.  A  great  number  of 
related  techniques  have  been  developed  to  overcome  specifc 
problems  that  arise  in  practice  in  model  building  (see  for 
example  Thei 1  [22],  Eykhoff  [20],  etc...) 

C.  CURRENT  RESEARCH 

Contrarily  to  current  research  in  econometrics,  where, 
because  of  the  wide  acceptance  and  history  of  practice  of 
the  simultaneous  equation  model,  most  of  the  research  is 
directed  towards  improving  on  existing  estimation  techniques 
or  refining  the  paradigm  in  the  area  of  nonideal  situations, 
the  engineering  trends  of  current  research  are  much  more 
oriented  to  questioning  accepted  practices  and  investigating 
new  theoretical  approaches,  probably  because  as  they  enter 
the  social  field  with  their  new  tools  and  their  "systems 
approach",  engineers  find  traditional  methods  and  outlooks 
inapropriate  for  the  magnitude  of  the  task  and  the  nature  of 
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the  problems  to  solve. . . 

The  following  few  examples  are  only  intended  as  an 
illustration  of  this  trend  and  constitute  by  no  means  a 
review  of  current  research. 

C.1  Model  Validation  -  Sensitivity  Testing 

In  their  effort  to  select  the  appropriate  variables  to 
include  in  their  models,  most  of  the  econometric 
mode  1 -bui lder s  have  used  single  equation  statistical  testing 
regardless  of  whether  the  system  under  consideration  is 
recursive  or  simultaneous,  although  the  limitations  of  such 
techniques  have  been  recognized  for  a  long  time. 

Such  tests  include  the  popular  "t-test"  of  parameter 
significance,  the  analysis  of  residuals,  the  Durbi n-Watson 
statistic  and  the  partial  correlation  coefficients. 

The  usefulness  of  these  tests  has  come  to  be  more  and 
more  criticized  recently  in  both  the  econometric  and 
engineering  field. 

In  particular,  Mass  and  Senge  [70]  have  found  that 
these  tests  provide  information  only  on  the  precision  with 
which  a  given  parameter  can  be  estimated  from  the  available 
data  and  not  on  the  importance  of  the  associated  variable  on 
determining  model  behavior,  that  reliance  solely  on  these 
tests  for  model  specification  can  lead  to  the  rejection  of 
important  parameters  on  the  basis  of  statistical 
"insignificance",  and  they  argue  that  "only  tests  that 
examine  a  variable's  influence  on  overall  model  behavior 
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provide  a  sound  basis  for  assessing  an  individual  variable's 
importance . " 

Basing  their  argument  on  the  use  of  system  dynamics  feedback 
models  for  social  sciences,  they  then  proceed  to  develop  an 
alternative  testing  approach  of  the  "model -behavior "  kind 
and  provide  two  examples  of  application  of  their  technique. 

A  recent  case  study  by  Ford  et  al .  [71]  supports  this 
approach  by  a  very  detailed  study  of  the  behavior  of  large 
computer  models. 

This  theme  is  expanded  on  in  the  conclusion. 

C.2  The  use  of  Information  Theory 

In  a  number  of  fields  of  physical  and  engineering 
science,  theorems  have  been  derived  with  respect  to  the 
maximum  result  obtainable.  For  example  the  Carnot  cycle  in 
thermodynamics,  Heisenberg's  relationship  in  physics  , 
Shannon's  maximal  channel  capacity  in  communication  theory. 
In  the  fields  of  parameter  and  state  estimation,  there  is  a 
need  too  for  such  limiting  theorems. 

"  It  would  be  most  useful  if  it  were  possible  to 
indicate  the  maximum  amount  of  information  about  a 
parameter  or  a  state  vector  that  could  be  derived  in 
clearly  specified  situations.  In  that  way , di f ferent 
estimation  techniques  or  schemes  applied  to  a  given 
situation  could  be  compared  just  as  communication 
techniques  can  be  evaluated  by  communication 
(information)  theory."  [20] 
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The  comparison  of  parameter  or  state  estimation  and 
information  theory  suggests  itself  very  strongly.  The  values 
to  be  estimated  are  the  messages  sent  by  an  information 
source;  the  transmitter,  channel,  receiver  represent  the 
information  processing  scheme;  the  noise  stands  for  the 
disturbances  in  process  and  measurement  devices. 

Some  problem  statements  have  already  been  made  by 
Zaborsky  [72],  Caines  [73]  and  others. 

The  best  limiting  theorem  available  at  present  is  the 
Cramer-Rao  bound  which  provides  Knowledge  of  the  accuracy 
that  can  be  obtained  under  certain  prespecified  conditions. 

A  great  part  of  current  research,  whether  dealing  with 
i dent i f i ab 1 i ty ,  estimation  algorithms  or  model  structure  is 
moving  in  the  direction  of  providing  such  bounds. 

Information  theoretic  methods  have  been  suggested  by 
many  authors  for  the  solution  of  the  related  problems  of 
hypothesis  testing,  signal  detection  and  model 
identification.  In  recent  years,  Kullback's  information 
measure  [74]  and  Bhat tachar rya' s  metric  have  been  made 
popu 1 ar  [ 75  ]  ,  [ 76 ] . 

The  identi f iabi 1 i ty,  model -structure  determination  and 
estimation  problems  are  not  seen  as  separate  anymore  but  are 
being  increasingly  integrated  in  one  single  process.  A  few 
examples  of  this  trend  are  Akaike's  "information  criterion" 
(AIC),  [77], [78]  based  on  a  Maximum  Likelihood  approach 
formulated  in  such  a  way  that  it  eliminates  the  need  for 
subjective  judgement  in  hypothesis  testing,  Rissanen's  use 
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of  an  entropy  measure  [79],  Tse's  quantitative  measure  of 
ident i f i abi 1 i ty  [80],  and  Caines'  notion  of  "predictor  set 
completeness"  [73]... 

C.3  Case  Studies 

In  surveying  the  actual  estimation  algorithms  used  in 
the  determination  of  the  models  of  interest  to  the 
engineering  literature,  the  Maximum  Likelihood  method 
strikingly  appears  as  the  most  used  and  the  most  powerful, 
specially  in  its  "filtering  form"  which  provides  the  best 
algorithm  to  date  for  the  estimation  of  state-space  models 
with  both  process  and  measurement  noise.  In  view  of  its 
importance  and  since  it  is  the  main  topic  of  this  thesis, 
the  method  and  a  few  applications  are  presented  separately 
in  section  IX. 


V.  THE  IDENTIFIABILITY  PROBLEM 

One  of  the  major  problems  that  arise  in  the  estimation  of 
multivariate  models  is  the  " ident i f i abi 1 i ty"  problem  (the 
"identification"  problem  in  the  econometric  literature;  the 
control  "identification"  is  equivalent  to  the  econometric 
" speci fi cat  ion" ) . 

This  can  best  be  illustrated  by  an  example  taken  from 
econometr i cs . 

Consider  the  following  simple  demand  and  supply  system: 

Demand:  De :  qt=  a  +  b»pt  +  ut  (1) 

Supply:  Se :  qk=  c  +  d«pt  +  vt  (2) 

where  qt  is  the  quantity  bought  at  t,  and  pt  is  the  price  of 
the  commodity  at  that  time. 

Let  us  take  a  linear  copmbination  of  the  relations: 

D,  =  k«D0  +  ( 1-k)»S0  (3) 

S,  =  1 »D0  +  ( 1  -  1 ) »S0  (4) 

we  get : 

D(  :  qfc  =  a»k  +  c  (  1  -  k )  +  pt»[(b»k  +  d*(1-k)] 

+  k»ut  +  ( 1 -k ) »vt  ( 5 ) 

S,  ;  qfc  =  a»l  +  c*  (  1  -  1  )  +  pt»[(b»l  +  d  (  1  - 1  )  ] 

+  1 »ut  +  ( 1  -  1 ) »vfe  ( 6 ) 

As  illustrated  in  Figure  V - 1 ,  (D0tS0)  and  (D,,S,)  are 
observat iona 1 ly  equivalent:  that  is,  given  an  observation  on 
q  and  p  ,  there  is  no  way  to  decide  which  is  the  true 

model . 

Now,  if  we  take  for  : 

qt  =  c  +  d*pk  +  n»rt  +  vt  (7) 
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FIGURE  V- 1 :  Observa t i ona 1 1 y  equivalent  models 
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where  rt  is  "rainfall"  and  rt  does  not  appear  in  D0  ,  then, 
any  linear  combination  of  D0  and  Sc  will  not  be  an  equation 
of  the  form  D0  (because  of  the  term  in  rt )  and  therefore, 
the  equation  D0  is  identified.  However,  in  this  case,  S0  is 
still  not  ident i f ied . 

More  formally,  let  our  model  be: 

S,  :  B»yk  +  Ox  t  =  u  t  (  8 ) 

wi  th : 

E(ufc)  =  0 

E(ut,u,fc  >  =  zr 

And  P (y t/x k )  the  conditional  distribution  of  yt  given  xt •  It 


two  different  structures  yield  the  same  conditional 
distribution  of  yt,  they  are  called  "observat iona 1 ly 
equivalent"  and  hence  are  not  identifiable. 

To  obtain  P(yfc/xt),  we  write: 

P(yt/xi,)  =  P(ut/xk).|d|  =  P  (u  t/xk  )•  |cfu  t/dy  t|  (9) 

where  J  is  the  Jacobian. 

Since  cfub/  dy  t  =  B,  equation  (9)  can  be  written: 

P(yt/xfc)  =  P{(ut  =  B»yk  +  Ox  J /x  |  B  |  (10) 

Suppose  we  premultiply  (8)  by  an  nonsigular  matrix  F  to 
obtain  the  system  S2  : 

S2  :  FByk  +  PCxk  =  Fu k  =  v  t  (11) 

and  we  have: 

E(vt)=  F • E ( u  fc ) =  0  ( 12) 

E(yt,y't)  =  E(Fut,u'tF'  )  =  F.  2Z  .F'  (13) 

Then : 

P(yt/it)  =  p (yt/xt ) •  |tfyfc/tfy  L|  (14) 
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where  : 


and 


but : 


dyt/dyt  =  FB 

P(vt/xt)  =  P  (ufc/xt)  •  |du  t/dv  J 


(15) 

(  15) 


so  that 


du t/dv  t  =  F ' 1 


(  17) 


P(yt/xt)  =  P ( u t/xt ) • | F - 1 | • | FB |  =  P ( ut/x  t) • | B |  (18) 

Thus  we  get  the  same  distribution  for  y,  and  S,  and  Sz 
are  observat iona 1 1y  equivalent. 

Econometric  theory  then  goes  on  defining  "admissible" 
tranformat ions  F  that  yield  equivalent  systems,  and 
conditions  for  ident i f i abi 1 i ty . 

If  the  system  is  written  as: 

A»z t  =  ut  (19) 

where : 

A  =  (B  C)  (20) 


zt  =  (y'fc  x't )'  (21  ) 

and  if  there  are  R  linear  homogeneous  restrictions  on  (say) 
the  first  row  of  A,  written  as: 

=  0  (22) 


where : 

*i=  (a„  , . a,„  )  <23) 

then,  the  first  equation  is  " just- identi f ied"  if  : 

rank  (  A»(pj  )  =  G- 1  (  24  ) 

where  G  is  the  number  of  equations. 

If  rank(A»(p|  )  >  G-1  (<G-1)  ,  the  system  is  over 
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(under ) - identi f ied. 

In  the  latter  case,  the  equation  is  inestimable.  A 
similar  condition  exists  for  the  covariance  matrix  of  the 
di s turbances . 

Thus,  before  starting  any  estimation  procedure,  one  has 
to  check  the  just (or  over)  ident i f i abi 1 i ty  of  each  equation, 
and  if  needed,  add  restrictions  to  make  it  just (over) 
i dent i f i ed . 

For  a  more  detailed  exposition  of  this  problem,  see  Fisher 
[81]  . 

Control  engineers  have  independently  dealt  with  this 
problem  (see  for  example  Tse  and  Anton  [82])  using  a 
different  terminology  and  a  different  approach.  A  general 
result  for  linear  systems  is  that  if  the  minimal  dimension 
of  the  system  is  known20  controllability,  observability  and 
stability  imply  ident i f i abi 1 i ty2 1  [82] 

Mehra  [24]  states  some  additional  conditions;  in 
particular,  that  the  system  be  of  minimal  order  is 
equivalent  to  say  that  only  the  proper  canonical 
representation  is  identifiable. 

It  is  for  this  reason  that  here,  and  in  a  great  part  of 
current  research,  minimal  forms  of  state  space  equations 
assume  primal  importance  (see  Akaike  [77]). 


20For  the  definition  of  "minimal  dimension",  a  detailed 
exposition  o  the  relationships  between  various  forms  of  a 
state-space  model  and  their  relationship  to  the  econometric 
formulation,  see  Mehra  [24]. 

21  All  these  concepts  derive  from  modern  control  theory  and 
are  defined  in  e.g.  [83] 
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It  is  worth  noting  that  the  econometric  conditions  were 
formulated  in  terms  of  each  equation  whereas  in  control 
theory,  i dent i f i abi 1 i ty  of  the  whole  system  is  examined.  The 
conditions  in  terms  of  ranks  of  matrices  amount  to  the  same 
idea,  however. 

Tse  and  Anton  [82]  also  give  detailed  conditions  for 
the  i dent i f i abi 1 i ty  of  parameters  in  terms  of  their 
probability  distribution  function  and  the  Maximum  Likelihood 
principle. 

In  practice  however,  the  great  majority  of  the  applied 
literature  examines  the  ident i f i abi 1 i ty  of  a  model  only  when 
faced  with  estimation  problems  (ill  condi tionned  or  singular 
matrices,  physically  nonreal izable  parameter  estimates  and 
large  associated  error  covariances).  The  usual  approach,  as 
outlined  by  Mehra  [84],  is  then  to  try  fixing  some  of  the 
parameters,  to  use  a  priori  weighing  for  some  of  the 
parameters,  to  try  constrained  optimization  or 
rank-deficient  solutions.  All  of  these  methods  depend 
entirely  on  the  modeler's  judgement... 


VI.  EXPERIMENTAL  WORK 


A.  INTRODUCTION 

As  stated  earlier,  (see  section  ( I  -  D )  )  ,  the  purpose  of 
the  present  work  is  to  investigate  the  behavior  of 
alternative  estimation  techniques,  drawn  from  both  the 
econometric  and  engineering  fields,  on  the  Market -Growth 
model . 

Senge' s  conclusion  that  the  OLS  and  GLS  were  inadequate 
to  estimate  system  dynamics  models  together  with  the 
preceding  review  of  existing  econometric  and  engineering 
methods  show  that  from  the  many  techniques  available,  only  a 
few  might  suit  our  needs,  that  is,  the  estimation  of  a 
non  1 i near - i n- the- var i ables  feedback  model  in  the  presence  of 
both  equation  and  measurement  noise. 

Since  the  survey  of  estimation  techniques  provides  no 
information  as  to  the  "a  priori"  performance  of  the  various 
estimators  in  this  particular  situation,  (and  even  less  if 
we  were  to  decide  on  the  ident i f i abi 1 i ty  of  these  models), 
it  is  ultimately  experimental  work  that  will  give  us  an  idea 
of  their  performance.  This  however  will  not  provide  a  basis 
for  generalization,  as  pointed  out  by  Senge  [17].  It  is  only 
when  enough  knowledge  has  been  accumulated  from  various 
sources  that  .hopefully,  some  conclusion  can  be  derived. 

Only  two  of  the  methods  of  interest  to  us  were 
available  as  estimation  packages  or  subroutines  at  the 
University  of  Alberta,  namely  the  2SLS  (widely  used  in 
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econometric  work)  and  the  LISREL  model.  The  use  of  the 
Filtering  form  of  Maximum  Likelihood  function,  although 
amply  documented  by  Mehra  [84],  was  not  available  as  a 
package  and  therefore  a  program  had  to  be  written  in  order 
to  use  this  method22 

The  following  sections  describe  the  experimental 
framework  and  the  results  and  analysis  of  the  various 
experiments.  As  a  first  step,  Senge' s  experiments  were 
repeated  using  OLS,  in  order  to  confirm  his  results  and  to 
have  a  basis  for  comparison  with  the  other  techniques  in  the 
same  environment.  An  alternative  econometric  method,  the 
LISREL  model,  has  been  tried  an  found  unsuccessful.  The 
results  are  not  reported  here.  A  computer  program  based  on 
Mehra' s  formulation  of  the  Filtering  Form  of  the  Maximum 
Likelihood  algorithm  was  then  developed,  tested  on  lower 
order,  simple  models  and  finally  applied  to  the 
Market -Growth  model . 


22  A  general  PL/I  program  called  GPSIE  for  "General  Purpose 
System  Identifier  and  Evaluator"  was  developed  by  D. 
Peterson  [85]  in  1976.  It  has,  among  other  possibilities, 
the  capacity  to  compute  Full  Information  Maximum  Likelihood 
(FIML)  estimates  of  parameters  in  a  dynamic  state-space 
model  in  the  presence  of  equation  and  measurement  noise  and 
is  based  on  the  work  of  Schweppe  [86],  [87].  This  however 
was  not  available. 

During  the  course  of  this  research,  a  later  version  of  this 
program,  closely  liked  to  the  DYNAMO,  called  DYNASTAT  was 
also  developed  at  MIT. 


68 


B.  THE  EXPERIMENTAL  FRAMEWORK 

The  specific  details  and  models  for  each  estimation 
technique  are  described  in  detail  in  the  respective 
sections.  A  brief  overview  of  the  framework  developed  by 
Senge  [17]  is  presented  here. 

B . 1  The  experimental  procedure 

A  data  generating  model  (the  parameters  of  which  are 
known)  having  properties  of  real-life  systems  (feedback, 
non  1 i near i t i es ,  complex  dynamics,  noise)  generates  synthetic 
"perfect"  data.  This  data  passes  through  a  measurement 
process  which  can  reflect  imperfections  in  data  measurement 
by  sampling,  including  a  random  component  or  withholding 
synthetic  data  for  a  particular  variable  to  reproduce 
unavailability  of  data. 

The  investigator  also  specifies  a  second  model  with 
unknown  parameters  (the  estimation  model)  allowing  for 
specification  errors  if  he  wishes  to  reproduce  imperfections 
in  model  specification. 

The  estimation  technique  under  test  can  then  be  applied 
to  the  estimation  model  and  using  synthetic  data,  yields  a 
set  of  estimated  parameters  and  statistics  for  hypothesis 
test i ng . 

By  comparing  the  estimated  parameter  values  to  the 
values  of  the  "true"  system,  the  investigator  can  asses  the 
accuracy  of  the  estimation  technique.  By  analysing  the 
statistical  test  results,  he  can  find  out  whether  or  not  the 
tests  reliably  alert  the  modeler  to  inaccuracies.  The 
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process  is  illustrated  in  Figure  VI-1. 

B.2  Variations  in  experimental  conditions 

Figure  VI-2  illustrates  the  breadth  of  experimental 
conditions  which,  according  to  Senge  [17],  must  be 
investigated  separately  and  in  combination  in  order  to  test 
thoroughly  the  accuracy  of  the  estimation  technique  under  • 
consideration  and  the  effect  on  estimation  results  of 
different  aspects  of  real- life  mode  1 -bui ldi ng  and  data 
measurement . 

Figure  VI-2  is  self  explanatory.  For  a  detailed 
discussion,  see  Senge  [17]. 

This  is  not  in  itself  a  new  idea.  This  framework  is 
exactly  the  same  as  the  one  developed  for  Monte-Carlo 
experiments23  in  the  econometric  1 i terature  except  that  the 
method  used  there  are  even  more  sophisticated  (as 
demonstrated  for  example  by  the  concept  of  Mean  Square  Error 
MSE24).  It  is  quite  simplistic  to  use  only  "a  few"  runs  to 
"compare"  parameter  values  and  is  a  weakness  in  Senge' s 
reporting  of  results  as  compared  to  the  econometric 
1 i terature . 

The  new  idea  here  is  that  "realistic"  conditions  must 
be  met  when  testing  these  estimation  techniques,  which  is 
not  the  case  of  most  econometric  Monte-Carlo  experiments  as 
pointed  out  in  section  (III-C.5). 

In  particular,  in  the  present  situation,  the  problem  of 
complex  nonlinear,  noisy  "system  dynamics"  models  that  have 

23See  section  (III-C.5) 

2 4  See  p.  41 
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FIGURE  VI-1:  The  experimental  procedure 
( from  [ 17] ) 
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not  been  built  with  estimation  in  mind  and  that  therefore 
reproduce  many  aspects  of  real-life,  are  simultaneously 
exami ned . 

In  view  of  the  difficulties  encountered  with  the 
problem  of  measurement  noise  alone,  the  present  approach  was 
limited  to  the  investigation  of  its  effects,  and  no 
attention  was  given  at  this  stage  to  the  problem  of 
mi sspeci f i cat i on  investigated  by  Senge  [17]. 


. 


V 


VII.  REPRODUCING  SENGE' S  EXPERIMENTS 


A.  THE  "MARKET  GROWTH"  MODEL 

The  Market -Growth " 2 5  model  was  developed  by  Forrester 
[19]  as  an  illustration  of  system  structure  concepts  in 
system  dynamics,  and  shows  how  a  firm's  marketing  and 
capital  investment  policies  can  produce  unstable  growth  in 
the  market  share. 

All  the  relationships  necessary  to  create  the  growth 
and  stagnation  patterns  encountered  are  included  within  the 
boundaries  of  the  system.  Within  the  closed  boudary26,  the 
system  consists  of  interacting  feedback  loops  as  illustated 
i n  F igure  VI I  -  1  27 

Three  major  feedback  loops  are  shown: 

1.  LOOP  1  is  a  positive  feedback  loop  involving  the 
marketing  effort:  increases  in  orders  booked  increase 
backlog  of  unfilled  orders  which  increase  the  delivery 
rate;  since  sales  are  thus  boosted,  the  marketing  budget 
can  be  increased,  which  means  more  salesmen  and 
therefore  more  orders  booked. 

2.  LOOP  2  involves  delivery  delay  and  sales  effectiveness: 
as  the  backlog  increases,  delivery  delay  increases 
thereby  diminishing  the  firm's  marketing  effectiveness. 

3.  LOOP  3  determines  the  production  capacity:  rising  order 
backlog,  as  indicated  by  delivery  delay,  is  taken  as  an 

25Where  growth  is  the  growth  of  a  new  product 

26  See  section  ( I  - A . 2 ) 

27  From  [17] 
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FIGURE  V 1 1  —  1 :  Major  feedback  loops 
( from  [  17 ] ) 


in  the  Market -Growth  model . 
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indication  of  inadequate  production  capacity.  More 
capacity  is  ordered,  which,  after  an  acquisition  delay, 
increases  production  capacity  and  delivery  rate  and 
therefore  reduces  delivery  delay. 

When  simulated,  the  Market -Growth  model  generates  a 
pattern  of  unstable  market  growth  and  capacity  expansion  as 
illustrated  in  Figure  VII-2.28 

The  Market -Growth  model  is  modest  in  size  (nine  state 
variables),  but  it  is  highly  nonlinear  (because  of  the 
relationships  between  sales  effectiveness  and  delivery  delay 
recognized  by  market,  production  capacity  fration29  and  the 
minimum  delivery  delay  ,  etc...  illustrated  by  nonlinear 
tabular  functions). 

Although  the  model  as  written  in  DYNAMO  contains  a 
great  number  of  relationships  including  Table  fuctions,  (see 
Forrester  [19]),  Senge  [17]  shows  that  for  the  purposes  of 
estimation,  it  can  be  reduced  to  the  following  set  of 
discrete  1 i near- i n- the  parameters  form:  (All  variable 
symbols  are  identical  to  symbols  used  by  Forrester  [19], 
with  the  exception  of  the  variables  PC001,  PC002  and  PC003 
involved  in  the  delay  in  acquiring  production  capacity;  the 
three  variables  are  subsumed  in  the  "production  capacity  on 
order"  PCOO  in  Forrester) 


28  This  figure  is  generated  by  a  DYNAMO  run  taken  from  the 
description  in  Forrester  [19] 

29  a  multiplier  taking  into  account  the  variable  utilization 
of  production  capacity 
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F IGURF  VII-2:  Behavior  of  the  Market -Growth  model 
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The  model  equations  are:30 


dBL  =  DT» [ K 1 «S ( -DT )  +  K2»S(-DT)»DDRM(-DT)  +  K3»BL ( - D  T ) 

+  K4«BL( -DT ) 2 / PC ( - D  T )  +  K5*BL ( -DT ) 3 /PC ( -DT ) 2  +  N1]  (1) 


dDRA  =  DT»[K8»BL(-DT)  +  K9«BL ( -DT ) 2/PC ( -DT )  + 

K10»BL(-DT) 3/PC( -DT ) 2 

+  K1  1«DRA( -DT)  +  N2]  (2) 

dS  =  D  T • [ K 1 2»DRA ( -DT )  +  K13*S(-DT)  +  N3]  (3) 

dPCOO 1  =  DT» [ K 1 4«PC ( -DT )  +  K15«PC(-DT)»DDC(-DT) 

+  K 1 6 • P  C ( -DT ) • D  DC ( -DT ) 2  +  K 1 7 • PC ( - DT ) »DDC ( -DT ) 3 
-  0 . 25»PCOO 1 (-DT)  +  N4 ]  (4) 

dPC002  =  DT» [ 0 . 25»PC00 1 ( -DT )  -  0 . 25»PC002 ( -DT ) ]  (5) 

dPC003  =  DT« [ 0 . 25»PC002 ( -DT )  -  0 . 25»PC003 ( -DT ) ]  (6) 

dPC  =  DT» [ 0 . 25«PC003 ( -DT ) ]  (7) 

DDC  =  ( 1/2) »DDRC  -  0.3  (8) 

dDDRC  =  DT« [ 0 . 2  5 • [ B  L ( -DT )/DRA( -DT  )  -  DDRC(-DT)]  +  N9]  (9) 

dDDRM  =  DT* [ ( 1 /6 ) • [ DDRC ( - DT )  -  DDRM(-DT)]  +  N10]  (10) 


Where:  dBL  =  BL  -  BL(-DT),... 

and  the  symbols  are  explained  in  Table  VI 1-1. 


30  In  Senge' s  documents  [17],  [18]  this  equation  appears 
without  the  term  "  -0 . 25»PC003 ( -DT ) " .  This  is  assumed  to  be 
an  error  since  the  proper  transcr ipt ion  of  a  third-order 
delay  is  as  indicated  here  (see  [12])  and  since  without  this 
term,  the  model  explodes  instead  of  generating  the  expected 
behavior ) 


« 


78 


TABLE  VII-1 

MARKET-GROWTH  MODEL  SYMBOLS 

BL  ( t ) 

order  backlog  (units) 

DRA ( t ) 

delivery  rate  average 

( uni ts /month ) 

S(t) 

number  of  salesmen  (men) 

PC  ( t ) 

production  capacity 

( uni t s/mon th ) 

PC001 ( t ) , 

PC002(t),  PC003(t)  production  capacity  on  order 

( uni ts /month ) 3 1 

DDRC(t) 

delivery  delay  recognized  by 

company  (months) 

DDRM(t) 

delivery  delay  recognized  by 

market  (months) 

DDC(t) 

delivery  delay  condition 

31  third  order  delay:  see 
Forrester  [12]) 
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When  the  Market -Growth  model  is  simulated  as  the  data 
generating  model  "DT"  in  equations  (1)  to  (10)  is  the 
simulation  interval  for  the  experiment,  and  Ni  denotes  the 
noise  input  in  each  equation.  When  the  model  serves  as  the 
estimation  model,  DT  is  the  sampling  interval  of  available 
data,  and  Ni  denotes  the  implicit  error  process  in  each 
equation . 

The  parameters  to  be  estimated  in  the  estimation 
experiments  are  designated  Ki  in  equation  (1)  to  (4).  The 
exact  values  of  the  parameters  (that  appear  in  Tables  V 1 1  - 2 
to  VII-11)  are  taken  from  Senge  [17].  They  often  appear  with 
complex  coterms  (as  in  equations  (1)  or  (2)  )  due  to  the 
series  expansion  (up  to  the  third  order  terms)  of 
Forrester' s  nonlinear  tabular  functions.  Senge  chose  the 
parameter izat ions  which  in  his  estimation  best  drew 
statistically  significant  information  from  the  data,  (see 
[17]  footnote  11). 

B.  EXPERIMENTAL  RESULTS 

B . 1  Ideal  conditions 

The  purpose  of  the  following  runs  is  to  demonstrate 
that  in  principle  OLS  can  yield  accurate  results  fot  the 
Market-Growth  model  (the  specification  of  which  is  very 
different  from  that  of  traditional  econometric  models).  That 
is,  that  despite  the  many  non  1 i near i t i es ,  feed-back  loops, 
and  the  formulation  of  the  model,  the  parameters  of  the 
Market-Growth  model  can  be  estimated  by  traditional 


■ 

■ 
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econometric  means  provided  that  the  specification  of  the 
model  is  right  and  that  the  data  is  free  from  measurement 
er rors . 

1.  Generation  of  the  data 

For  these  ideal  conditions,  small  noise  inputs  are 
added  to  equations  (1),(2),(3),(4),(9)  and  (10). 

Equation  (8)  is  merely  an  identity  and  (5), (6)  and  (7) 
are  restatements  of  the  third  order  delay  in  Forrester' s 
model  for  the  purposes  of  estimation:  therefore  they  do 
not  have  noise  inputs. 

The  noise  sequences  are  generated  as: 

N  =  GGNOF ( ***** ) *0 . 0 1 *MX 

where  GGNOF  is  a  random  noise  generator  from  the  IMSL 
library32  [88],  *****  is  a  seed  number  determining  the 
actual  sequence  of  random  numbers  and  MX  is  the  mean  of 
the  left  hand  side  variable  computed  from  noiseless- 
data.  GGNOF  then  generates  a  normally  distributed  noise 
sequence  with  standard  deviation  equal  to  0.01*MX. 

Seven  runs  were  tried  with  different  noise 
sequences.  The  seed  number  is  the  same  for  all  the 
inputs  in  run  1  to  5.  Each  equation  has  a  different  seed 
number  in  runs  6  and  7.  In  both  cases  the  correlation 
matrix  of  the  six  noise  sequences  for  each  run  indicates 
that  the  hypothesis  of  no  cross-correl at  ion  between 
noise  sequences  is  very  acceptable  (maximum  correlation 
of  .2) 


3  2GGN0F  has  been  deleted  from  version  7  Of  the  IMSL  library 
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The  simulation  interval  DT  was  taken  as  1  (month)  in  all 
runs.  The  initial  conditions  for  each  level  variable  are 
from  Forrester  [19];  PC001 ,  PC002,  PC003  are  assumed  to 
have  the  same  initial  values  as  PCOO  in  Forrester's 
model.  One  hundred  data  points  were  generated  for  each 
run.  All  level  variables  are  measured  (although  only  BL, 
DRA,  S ,  PC001,  PC,  DDC,  DDRM  are  needed  for  estimation). 
The  experimental  conditions  for  each  run  are  summarised 
in  Table  VII-2. 

2.  Estimation  of  parameters 

The  Time  Series  Processor  (TSP)  package  [89] 
available  at  the  University  of  Alberta  was  used  for  all 
OLS  estimations. 

All  variables  are  lagged  once  by  the  estimation  program 
thus  leaving  99  points  of  data  for  the  actual 
est imat ion . 

The  four  equations  with  unknown  parameters  (equations 

(1)  to  (4))  are  estimated  both  in  the  level  variable 

form  and  in  the  first  difference  form  as  follows: 

X  (  t )  =  X  ( t  -  DT)  +  22  b  j  •  X  j  (t  -  DT)  (11) 

X  ( t )  -  X  ( t  -  DT)  =  2C  b  :  •  X ( t  -  DT)  (12) 

!  1  1 

In  both  cases  all  the  parameter  values  are  the  same.  The 
only  difference  is  the  value  of  the  measure  of  goodness 
of  fit  R2  (coefficient  of  goodness  of  fit)  which  is  much 
lower  when  the  equations  are  estimated  in  the  first 
difference  form,  as  explained  by  Maddala  [33,  p.  92]. 

In  the  case  of  equation  (8)  where  the  coefficient  of 
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TABLE  VII-2 

EXPERIMENTAL  CONDITIONS  FOR  TABLES  VII-3  to  VII-6 
IDEAL  SITUATION  -  1%  Equation  noise 


Inital  conditions  for  state  variables  in  Market  Model 


BL  =  8000 

DDC  =  .7 

PC001  =  -1728 

S  =  10. 

DDRC  =  .7 

PC002  =  -1728. 

DDRM  =  2. 

DRA  =  4000. 

PC003  =  -1728. 

PC  =  12000. 

DT  =  1  (month) 

Means  of  variables  used  in  computing  noise  levels 

MBL 

40558 . 5 

MDDRC  =  3.29 

MDRA 

13638 . 5 

MDDRM  =  3.24 

MS 

56 

.  529 

MPC  =  15589.7 

MPC001  = 

1499 . 26 

Noise  Sequences 

seed  numbers 

for  GGNOF  routine 

Run  ft 

Seed  numbers 

for 

1 

12345 

al  1 

2 

5432  1 

al  1 

3 

29875 

al  1 

4 

94732  1 

al  1 

5 

33674 

BL 

2556 

DRA 

1212 

S 

95642 

PCOO  1 

4680 

DDRC 

48801 

DDRM 

6 

947312 

BL 

2345 

DRA 

63  12 

S 

95642 

PC001 

44872 

DDRC 

44352 1 

DDRM 

100  data  points  are  generated.  99  are  available  for  estimation 
There  is  no  measurement  noise. 
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PCO0 1 ( - 1  )  is  not  estimated  (it  is  equal  to  .25  because 
it  is  introduced  by  the  rewriting  of  the  third-order 
delay),  the  regressions  are: 


PC001(t)  -  0 . 25»PC001 ( t  -  DT)  =  ZZ  b-»X;(t  -  DT )  (13) 
PCOOKt)  -  1  . 25»PC00 1  ( t  -  DT)  =  XL  b ,  •  X  ;  ( t  -  DT)  (14) 
For  each  equation,  the  regression  coefficients  are  given 
by: 


b  =  (Z'ZM.Z'y 

where 


(15) 


b  =  vector  of  regression  coefficients  (Nxl) 

Z  =  matrix  of  right-hand  side  variables  (TxN) 
y  =  left-hand  side  variable  (TX1) 

Among  other  output,  the  TSP  package  OLSQ  routine  gives: 

a.  The  t-statistics  for  the  null  hypothesis  that 
individual  regression  coefficients  are  zero: 

t;  =  b;  / Sj  (  16) 

where  s  is  the  estimated  standard  error  for  each 
coefficient . 

A  coefficient  is  accepted  as  being  different 
trom  zero  at  the  95%  confidence  level  if  the 
t-statistic  is  greater  than  2.  Otherwise,  it  is 
"insignificant"  and  the  usual  econometric  practice 
is  to  run  another  regression  without  the  explanatory 
variable  associated  with  that  coefficient. 

b.  The  coefficient  of  goodness  of  fit  R2: 

R2  =  1  -  (e'  «e  /  (y  -  y)'»(y  -  y)  (17) 
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where  the  residuals  e  are  calculated  from: 

e  =  y  -  Zb  (18) 

and  y  is  the  sample  mean  of  y. 

The  goodness  of  fit  R2  measures  the  amount  of 
variation  of  the  dependent  variable  "explained"  by 
the  set  of  regressors.  The  closer  it  is  to  one,  the 
better  the  fit. 

c.  The  Durbi n-Watson  statistic  DW: 

r  T 

DW  ="21{e(t)  -  e  ( t  - 1  ) } 2  /^~e;(  t )  (19) 

t:Z  t=  I 

where  T  is  the  number  of  data  points. 

The  Durbi n-Wa tson  statistic  is  a  measure  of  the 
amount  of  autocorrel at  ion  in  the  least  squares 
residuals  (provided  that  the  residuals  are  normally 
distributed  ).  If  this  autocorrelation  is  equal  to 
1,  DW  =  0 ;  if  it  isO,  DW=2,  and  if  it  is  -1, 

DW  =  4. 

Hence,  if  DW  i sc  lose  to  0  or  4,  we  Know  the 
residuals  e(t)  and  e(t-1)  are  highly  correlated.  A 
value  close  to  2  is  indicative  of  little  or  no 
autocorrelation  in  the  residuals  (noise  sequence). 

3.  Estimation  results 

The  results  of  seven  runs  for  the  ideal  conditions 
are  reported  in  Tables  V 1 1  -  3  to  V 1 1  -  6 .  t-statistics 
appear  in  parenthesis  below  each  estimated  coefficient 
and  the  values  of  R2,  in  level  and  first  difference 
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forms,  and  of  DW  are  given  for  each  equation. 

As  shown  in  these  tables,  in  principle,  the  model 
can  be  estimated  by  OLS.  This  is  the  only  case  where  all 
the  assumptions  of  OLS  are  met: 

1)  E ( u ; )  =0  for  all  i:  zero  mean  for  the  noise 

2)  V ( u ; )  =  S2  for  all  i:  common  variance 
( homoscedast i c  process) 

3)  E(u;,Uj)  =  0  for  i  i  j:  uncorrelated  noise 

4)  E ( u ;  , x j )  =  0  for  all  i,j:  noise  uncorrelated 
wi  th  X 

Parameter  estimates  deviate  only  slightly  from  the  true 
parameter  values,  estimates  are  uniformly  significant  at 
the  99  %  confidence  level  (all  t-statistics  are  above 
2.37  ),  R2  is  close  to  one  in  all  equations  and  DW  close 
to  2  indicates  little  autocorrelation  in  all  equations 
except  equation  (1).  In  this  case,  DW  close  to  one 
indicates  about  50%  autocorrelation  which  according  to 
Senge  [17]  stems  from  the  linearization  of  the 
non  1 i near i t i es  in  that  equation.  However,  the  values  of 
K1  to  K5  are  consistently  worse  that  those  indicated  by 
Senge  [17],  possibly  due  to  this  autocorrelation.  All 
other  estimates  are  comparable  to  or  better  than  those 
given  by  Senge. 

4.  Conclusion. 

Although  only  a  few  runs  were  attempted  in  order  to 
reproduce  Senge' s  "typical"  results,  under  these  ideal 
conditions,  even  though  the  model  is  highly  nonlinear 
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and  includes  many  feedback  loops,  thus  differing  notably 
for  usual  econometric  formulation,  the  results  indicate 
that  OLS  is  in  principle  capable  of  yielding  acceptable 
parameter  estimates  of  the  Market -Growth  model. 

B . 2  Imperfections  in  data  Measurement 

Senge  [17]  argues  that  realistic  imperfections  in  data 
measurement  (10%  error  is  characteristic  of  most  econometric 
time  series;  see  Morgenstern  [90]  )  result  in  highly 
inaccurate  estimates  of  the  parameters  of  the  Market -Growth 
model  taken  as  a  specimen  of  "System  Dynamics"  feedback 
models.  The  following  few  runs  attempt  to  reproduce  some  of 
hi s  resu Its. 

1.  Experimental  conditions. 

Uncorrelated  random  noise,  with  variance  equal  to 
10%  of  the  mean  of  each  measured  level  variable  ,  BL, 
DRA,  S,  PC001,  PS,  DDRC ,  DDRM ,  ( DDC  =  1/2*DDRC  -  0.3), is 
now  added  to  these  variables. 

All  the  other  experimental  conditions  remain  identical 
to  those  presented  in  section  ( V 1 1  - B ) .  Although  all  the 
assuptions  of  OLS  are  not  met  in  this  experiment,  OLS 
estimation  is  still  used  here. 

The  noise  inputs  are  added  to  the  relevant  variables  as: 

X'i  =  Xi  +  V;  (20) 

where : 

y.  =  GGNOF  (*****) *0 .  1*MX;  (21) 

Table  V 1 1  -  7  summarises  the  experimental  conditions  for 


the  six  runs. 
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TABLE  V I  I  - 7 

EXPERIMENTAL  CONDITIONS  FOR  TABLES  VII-8  to  VI I - 1 1 
10%  MEASUREMENT  ERRORS 
Initial  conditions  same  as  in  Table  VII-2 
Means  of  variables  same  as  in  Table  VII-2 
Noise  Sequences:  seed  numbers  for  GGNOF  routine 
Run  #  Equation  noise  Measurement  for 


no  i  se 

as 

in  previous 

66751 

BL 

run  5 

93023 

DRA 

10735 

S 

76925 

PC001 

5675 

DDRC 

22768 

DDRM 

3  1  356 

PC 

as 

in  previous 

6675  1 

BL 

run  6 

4202  1 

DRA 

10735 

S 

16305 

PC001 

5653 

DDRC 

1763 

DDRM 

39804 

PC 

as 

in  run  1 

as  in  run  2 

al  1 

as 

in  run  2 

as  in  run  1 

al  1 

94732 1 

94732  1 

al  1 

5 
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2.  Estimation  results. 

The  results  in  Tables  V 1 1  - 8  to  VII-11  indicate  the 
severe  deter iorat ion  of  the  estimates  due  to  the  modest 
10%  measurement  noise. 

Statistically  significant  estimates  in  equations 
(1)  to  (3)  deviate  from  their  true  value  by  factors 
ranging  from  2  to  9 . 

■  Equation  (1)  is  the  most  accurately  estimated:  all  the 
parameters  in  this  equation  fall  within  a  factor  of  two 
of  the  real  values  and  most  are  statistically 
significant  at  the  99%  confidence  level  ( t-stat ist ics 
exceed  2.37) 

■  Except  for  the  odd  discrepancy,  most  parameters  in 
equation  (2)  are  also  within  a  factor  of  two  to  three  of 
their  real  values,  are  significant  at  the  99% 
probability  level  and  are  on  the  average  much  better 
than  those  indicated  by  Senge  for  that  equation. 

■  The  most  inaccurate  results  occur  in  equation  (3) 
where  the  statistically  significant  estimates  of  K10  are 
in  error  by  a  factor  of  4  (in  average)  and  those  of  K 1 1 , 
by  a  factor  of  5. 

■  All  estimates  in  equation  (4)  have  the  wrong  sign  and 
are  statistically  insignificant  at  the  95%  level.  Thus 
the  modeler  is  warned  of  the  error;  however,  he  has  no 
way  of  improving  on  the  specification  since  it  is  alredy 
a  perfectly  specified  model. 
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3.  Conclusion. 

Although  Senge  reports  results  of  GLS  estimation, 
which  he  finds  much  better  than  the  OLS  ones,  the  OLS 
results  obtained  here  for  equations  (1)  and  (2)  are  on 
the  average  much  better  than  Senge' s  thus  putting  in 
doubt  the  necessity  to  use  GLS.  The  results  for  equation 
(3)  are  worse  than  his,  however.  Even  though  all  the 
coefficients  in  equation  (4)  have  the  wrong  sign  here, 
their  statistical  significance  is  comparable  to  that 
obtained  by  Senge:  in  both  cases  the  modeler  is  informed 
that  the  values  are  wrong  or  insignificant.  Mass  and 
Senge  [70]  discuss  in  detail  the  fallacy  off  viewing 
measures  such  as  the  t-test  as  tests  of  specification.33 
These  results,  although  inex tensive,  prove  beyond  doubt 
Senge' s  claim  that  it  is  unrealistic  and  meaningless  to 
try  to  estimate  system  dynamics  model  using  such  methods 
as  OLS  given  that  all  available  data  contains  at  least 
10%  measurement  error. 

Senge  goes  on  investigating  the  effect  of  the  relative 
size  of  the  measurement  and  equation  noise  inputs.  He 
finds  that  increasing  the  equation  noise  to  10%,  as 
might  be  suggested  by  econometric  theory34,  does  not 
help  improve  the  accuracy  of  the  estimates  (in  the 
contrary  some  are  deter iorated ) . 

Considering  that  the  experimental  conditions  are  still 


33  See  section  (IV-C.1) 

34  See  [17] 
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very  ideal  (all  data  available,  100  data  points  and 
perfectly  specified  model),  he  further  demonstrates  that 
a  realistic  speci f i cat i i on  error  in  equation  (1)  results 
in  "meaningless"  estimates...  However,  since  we  have 
limited  ourselves  here  to  the  investigation  of  the 
effect  of  measurement  noise  on  the  estimates  and  their 
possible  improvement  by  alternative  methods,  we  wi 1 1  be 
considering  only  mesurement  errors  in  the  sequel. 


■ 

. 


VIII.  ALTERNATIVE  ECONOMETRIC  METHODS 


A.  INTRODUCTION 

Of  all  the  econometric  methods,  the  most  used  is  the 
Two-Stage  Least  Squares  technique  (2SLS)  which  was  developed 
specifically  to  avoid  the  shortcomings  of  OLS  in 
"simultaneous"  equations  systems.  Its  behavior  has  been 
thoroughly  investigated  and  it  performs  well  in  most 
econometric  problems,  even  in  the  case  of  dynamic  models 
with  lagged  variables  on  the  right-hand  side  of  the 
equat i ons . 

This  single  equation  method  would  seem  to  have  been 
able  to  perform  at  least  better  than  OLS  with  a  dynamic 
model.  It  has  been  used  by  Mehra  [24]  on  a  state-space  model 
to  determine  model  structure  prior  to  the  estimation  of 
parameters  using  Maximum  Likelihood. 

Unfortunately,  the  nature  of  System  dynamics  models, 
where  all  right-hand  side  variables  are  lagged,  and  where, 
all  the  behavior  of  the  model  being  generated  within  the 
boundaries,  there  are  no  external  inputs,  rules  out  the  use 
of  2SLS . 

Referring  back  to  section  (III-B.2),  the  2SLS  method 
goes  around  the  problem  of  OLS  bias  by  first  estimating  the 
reduced  form35  of  the  system  by  OLS  (which  yields  an 
unbiased  estimate),  and  then,  by  replacing  the  endogenous 
variables  in  the  right-hand  side  of  equation  i  by  their 

35  see  p.  32 
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estimates  obtained  from  the  reduced  form.  Since  system 
dynamics  models  are  already  in  the  reduced  form,  (there  are 
no  contemporaneous  endogenous  variables  in  the  right-hand 
side  of  the  equations),  the  2SLS  method  is  not  applicable. 

This  situation  illustrates  very  well  the  debate  over 
"simultaneous"  equations  system  versus  "recursive"  equations 
systems  that  has  prevailed  for  the  last  30  years  in  the 
econometric  literature  and  is  is  still  very  live,  by  showing 
that  the  application  of  econometric  estimation  techniques  to 
system  dynamics  models  raises  fundamental  issues  as  to  the 
"right"  structure  to  represent  the  "real-world". 

Mehra  [24]  states  that  there  is  very  little  advantage 
in  engineering  in  using  "simultaneous"  equations  systems, 
based  on  an  analysis  of  the  state-vector  form  of  these 
models.  Wold  [91]  has  argued  for  several  years  that  an 
appropriate  formulation  of  economic  models  necesserily  leads 
to  recursive  systems  (since  for  Wold  "the  world  is 
recursive")36.  Fisher's  view  [92]  is  that  simultaneous 
models  are  limiting  approximations  to  non- si  mu  1 taneous 
models  when  certain  time  lags  converge  to  zero.  Senge  [93] 
summarizes  this  approach  very  well  and  further  elaborates  on 
the  issue  in  the  light  of  the  state-space  approach... 

Being  unable  to  use  2SLS,  the  only  other  econometric 
method  of  interest  available  was  the  LISREL  model37 


36  This  is  also  Forrester' s  view 

37  The  work  of  Senge  was  extended  to  the  investigation  of 
the  performance  of  IV  estimators  on  the  same  model  by  J. 
Morecroft  [94],  but  with  only  5%  measurement  noise.  The 
results,  although  better  than  the  OLS  ones,  are  far  from 


, 
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B.  LISREL  RUNS 

The  LISREL  method,  described  in  section  1 1 1  - D ,  was 
tried  with  several  smaller  scale  models  (as  described  in 
section  (X-B)),  as  well  as  with  the  Market -Growth  model,  in 
order  to  investigate  its  behavior  with  the  presence  of 
lagged  variables  and  specification  of  the  order  of 
nonlinearity  found  in  the  Market -Growth  model. 

B.1  The  I dent i f iabi 1 i ty  Problem 

The  solution  of  the  i dent i f i abi 1 i ty  problem  (see  [64]) 
is  often  complicated  and  tedious;  explicit  solutions  for  all 
8' s  seldom  exist.  It  is  often  difficult  to  determine  whether 
or  not  a  parameter  is  identified  or  whether  or  not  the  whole 
model  is  identified. 

However,  the  program  computes  the  information  matrix  at 
the  starting  point  of  the  F 1  etcher -Powel 1  search.  If  this 
matrix  is  positive  definite,  it  is  almost  certain  that  the 
model  is  identified  [64].  On  the  other  hand,  if  it  is 
singular,  the  model  is  not  identified  and  an  error  message 
indicating  a  possible  unidentified  parameter  in  the 
paramater  vector  is  printed  out. 

B.2  Experimental  results 

The  program  failed  to  converge,  or  displayed  various 
singularity  or  other  fatal  erros,  even  with  sipler  models 
and  since  the  actual  listing  was  not  available, 


37 ( cont' d )bei ng  satisfactory. 
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troubleshooting  was  very  limited.  It  is  a  general  weakness 
of  "all-purpose"  programs  that  they  are  hermetic  to  all 
except  their  "creator". 

In  view  of  their  lack  of  interest  and  to  keep  this  thesis 
concise,  the  results  are  not  reported  here. 

The  only  alternative  estimation  technique  open  for 
investigation  was  the  Filtering  Form  of  the  Maximum 
Likelihood  algorithm,  which  is  explained  in  detail  in  the 
next  section. 


IX.  THE  FILTERING  FORM  OF  THE  LIKELIHOOD  FUNCTION 


A.  PHILOSOPHY  OF  THE  MAXIMUM  LIKELIHOOD  METHOD 

The  Maximum  Likelihood  (ML)  method  was  first  introduced 
by  R.  A.  Fisher  in  1912  as  a  general  method  for  statistical 
parameter  estimation. 

It  can  be  explained  as  follows  [22]. 

Let  (X, . ,Xn)  be  a  random  sample,  and  let  b  be  the 

object  of  estimation.  Then,  the  joint  density  function  of 
the  sample  evaluated  at  the  sample  values  is: 

f  (  X  t  /b )  • - «f  (Xn/b)  =  L  ( X , . Xn;b)  (  1  ) 

The  product  of  the  f ' s  on  the  left  side  specifies  the 

probability  density  of  obtaining  the  sample  (X, . Xn) 

given  that  b  is  the  parameter  value  (if  the  samples  are 
independent).  This  product  is  written  as  L  (  .  )  .  which  is 
known  as  the  Likelihood  function. 

The  main  reason  for  the  introduction  of  this  function  is 
that  the  expression  in  equation  (1)  can  also  be  regarded 
as  a  function  of  b,  given  the  sample,  i.e.,  given  that 
the  sample  has  been  drawn,  the  value  of  the  likelihood 
depends  on  b. 

The  method  of  Maximum  Likelihood  proposes  to  use  as 
an  estimate  of  b,  the  function  Jh(X,  .  .  .  .Xn)  which 
maximizes  the  Likelihood  function: 

L  (  X , . X  n;£>)  >  L  (X  , . Xn;c)  (2) 

where  c(X1...Xn)  is  another  estimator. 
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The  underlying  intuitive  idea  is  that  one  should 
prefer  a  parameter  value  that  implies  a  large 
probability  of  the  sample  drawn  rather  than  a  parameter 
value  that  declares  that  the  sample  is  less  probable. 

B.  MAXIMUM  LIKELIHOOD  AND  IDENTIFICATION 

In  practice,  there  are  two  major  problems  in 
obtaining  Maximum  Likelihood  estimates  for  dynamic 
systems.  These  are: 

1.  Deriving  an  expression  for  the  Likelihood 
function,  and 

2.  Maximizing  the  Likelihood  function  with  respect 
to  the  unknown  parameters. 

...  The  Likelihood  function  is  the  logarithm  of  the 
joint  probability  density  of  the  observations  given 
the  parameters.  If  the  observations  are  independent, 
the  joint  probability  density  function  is  easily 
written  down  since  it  is  just  the  product  of  the 
probability  densities  of  each  observation  given  the 
parameters.  The  derivation  of  the  Likelihood 
function  becomes  much  more  dificult  when  the 
observations  are  correlated.  This  is  necessarily  the 
case  for  dynamic  systems  with  random  inputs  since 
the  state  at  any  time  is  correlated  with  the  state 
at  all  the  previous  times". 

However,  it  will  be  shown  in  section  (IX-D)  that  the 

Likelihood  function  for  a  dynamic  system  can  be  derived  in  a 
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simple  form  using  the  Kalman  Filter  and  the  resulting  white 
noise  innovation  sequence38. 

"...  The  second  problem  in  obtaining  Maximum 
Likelihood  estimates  is  a  computational  one. 

Generally,  the  Likelihood  function  is  highly 
nonlinear  in  terms  of  the  parameters.  For  finite 
data  lenghts,  it  is  also  known  to  have  several  local 
maxima.  In  the  case  of  dynamic  systems,  certain 
differential  equation  constraints  have  to  be  kept 
satisfied.  The  choice  of  a  suitable  search  algorithm 
is  very  important  for  the  successful  application  of 
Maximum  Likelihood  identification."  [84] 

The  method  originated  in  the  work  of  Astrom  [95], 
Kashyap  [96]  and  Mehra  [97]  and  is  capable  of  solving  the 
most  general  identification  problem,  including  systems 
governed  by  nonlinear  differential  equations,  the  presence 
of  additive  white  noise  in  the  equations,  and  random 
disturbances  corrupting  measured  variables. 

The  first  application  of  the  Maximum  Likelihood  method 
to  system  identification  is  due  to  Astrom  and  Bohlin  [98] 
who  considered  single- input-single-output  (SISO)  and  later 
mul tiple- input-single-output  (MISO)  systems  of  difference 
equation  of  the  form: 


38  See  section  (IX-E)  for  description  of  the  Kalman  Filter 
and  the  definition  of  "innovation". 
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( 1+a , .q- 1 + . . . .+a  .q“n )»y( t ) 


+  !•( 1+c,  . q~ 1 + . +cn .  q~n )»e(t) 


(3) 


or 


p 


A  (q-  1  )  •  y  (  t )  =  2Z  B  (q-i)«u;(t)  +  1  «C  ( q- 1  )  *e  ( t )  (4) 


t  c  \ 


where : 


q:  is  the  shift  operator 
p:  the  number  of  inputs 

e(t):  is  a  sequence  of  independent  gauss i an  noise 
1:  level  of  the  noise  signal 

A  more  efficient  computational  procedure  was  derived  by 
Eaton  and  presented  two  years  later  [99]. 

The  generalization  to  identification  using  state-space 
models  has  been  presented  by  Caines  [100]  and  Woo  [101],  for 
example,  at  the  second  IFAC  symposium  on  Identification  and 
Process  Parameter  Estimation  in  1970. 

The  linear  form  of  these  models  is  the  classical: 


(5) 


x(t+1)  =  A • x ( t  )  +  B»ut  +  G»w(t) 


(6) 


z ( t  )  =  H •  x  (  t  )  +  v(  t  ) 


where  x.  u,  z,  w,  v,  are  as  defined  in  section  (IX-E). 

The  purpose  of  identification  is  to  find  a  model  M(b) 
that  in  some  sense  suitably  describes  the  measured 
input-output  data.  Thus,  the  identification  method  depends 
largely  on  the  criterion  used  to  define  "suitability  . 

The  prediction  of  z(t+1)  plays  an  important  role  for 
control.  In,  e.g  Linear  Quadratic  Control  Theory,  the 
optimal  input  signal  shall  be  chosen  such  that 
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E{z( t+1 ) /Z( T ) }  (where  Z(T)  =  {z ( 0 ) . . . z( t ) } )  has  desired 
behavior . 

Therefore,  it  is  very  natural  to  choose  a  model  that 
gives  the  best  possible  prediction. 

That  is,  some  fuction  of  the  prediction  error: 

e(t+1)  =  z(t+1)  -  E{z(t+1 )/Z(T) , M ( b ) }  (7) 

should  be  minimized  with  respect  to  b. 

Ljung  [102]  shows  that  prediction  error  criteria  are 
intimately  connected  with  the  Maximum  Likelihood  approach. 
This  is  also  done  by  Mehra  [84]  using  the  innovations 
approach3  9 . 

The  interest  of  the  Maximum  Likelihood  approach  lies  in 
the  fact  that  its  formulation  involves  not  only  the 
prediction  error  terms,  but  also  the  covariance  matrix  of 
those  terms,  thus  allowing  us  to  use  the  maximum  information 
avai 1  able . 

It  is  interesting  that  the  Kalman  Filter  (to  be  discussed  in 
section  ( I X - E )  )  yields,  in  a  situation  where  both  equation 
and  measurement  noise  are  present,  the  very  terms  that  are 
needed  for  the  computation  of  the  Maximum  Likelihood 
function,  namely,  the  prediction  error  and  its  covariance 
matrix. 

It  was  a  natural  step  therefore  to  associate  the  Kalman 
Filter  with  the  Maximum  Likelihood  algorithm.  This  coupling 
resulted  in  what  has  come  to  be  known  as  the  "Filtering  Form 
of  the  Maximum  Likelihood  Algorithm". 


39  See  section  (IX-F) 
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By  1973,  at  the  third  IFAC  Symposium  in  the  Hague  [84], 
participants  were  in  the  position  to  present  numerous 
applications  of  this  method  to  the  identification  of  actual 
problematic  cases  and  show  the  power  of  this  algorithm  to 
perform  effectively  in  a  situation  where  other  estimators, 
and  OLS  in  particular,  yield  biased  estimates  because,  among 
other  factors,  of  the  measurement  noise. 

These  include  applications  of  the  Maximum  Likelihood 
algorithm  to  the  identification  of  drum  boiler  dynamics 
[84,  p.  87],  aircraft  paprmeters  [84,  p.  117],  ship  dynamics 
[84,  p.  415],  power  generator  dynamics  [84,  p.  367],  nuclear 
power  and  reactor  dynamics  [84,  p.  375],  to  quote  only  a 
few. . . 

More  recently,  Ljung  [102]  solved  the  i dent i f i abi 1 i ty 
and  consistency  problems  for  the  general  linear,  multi¬ 
dimensional  t ime- i nvar i ant  system  under  quite  general 
assumptions,  within  the  framework  of  prediction  error 
identification  methods.  These  results,  as  well  as  some  other 
strong  theorems  about  the  multivariate  case,  have  been  used 
by  e.g.  Olsson  [103]  in  further  experiments  in  modeling  and 
identification  of  a  nuclear  reactor. 

An  interesting  development  of  the  Maximum  Likelihood 
algorithm  is  found  in  the  work  of  Akaike  [77],  who  recently 
motivated  the  use  of  Kullback's  information  criterion  [74] 
in  the  context  the  Maximum  Likelihood  approach.  He 
introduced  an  information  criterion  (AIC)  for  both 
estimation  and  statistical  model  determination  defined  by: 
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AIC  =  -2. (maximum  log- 1 i Ke 1 i hood )  +  2. (number  of 

independently  adjusted  parameters)  (8) 

When  several  competing  models  are  being  fitted  by  the  method 
of  Maximum  Likelihood,  the  one  with  the  smallest  number  of 
parameters  (the  smallest  value  of  AIC)  is  chosen  as  the  best 
model.  The  final  estimate  is  called  the  minmum  AIC  estimator 
(MAICE).  This  is  a  very  elegant  method  for  determining  the 
structure  of  a  representation  for  noisy  input-output  data 
and  proves  superior  to  traditional  hypothesis  testing 
methods . 

11  By  the  introduction  of  MAICE  the  problem  of 
statistical  identification  is  explicitly  formulated 
as  a  problem  of  estimation  and  the  need  for 
subjective  judgement  required  in  the  hypothesis 
testing  procedure  for  the  decision  on  the  level  of 
significance  is  completely  eliminated."  [78] 

Rissanen  [79]  further  expands  on  this  theme  and 
develops  a  new  criterion  for  mode  1 -structure  determination 
based  on  the  concept  of  entropy. 

The  following  sections  give  an  intuitive  description  of 
this  algorithm  referred  to  in  this  thesis  as  LIKALM,  for 
"LIKelihood  using  the  KALMan  filter",  a  brief  summary  of  the 
Kalman  Filter,  and  a  formal  derivation  of  the  LIKALM 
algorithm.  The  method  is  first  derived  for  a  simple 
one-dimensional  example.  The  general  equations  for  the 
Kalman  Filter  and  the  LIKALM  algorithm  are  then  given. 
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C.  AN  INTUITIVE  APPROACH  TO  THE  FILTERING  FORM  OF  THE  ML 
ALGORITHM 

The  difference  between  the  Least  Squares  and  the  LIKALM 
approach  can  be  illustrated  by  a  simple  one  dimensional 
examp  1 e . 

Let  us  assume  the  following  model: 

x ( t+  1  )  =  b»x ( t )  +  u  ( t )  (  9  ) 

z( t+1  )  =  x( t+1  )  +  v( t+1 )  (10) 

where  u ( t )  and  v(t)  are  two  independent  gaussian  sequences 
with  Known  covariances: 

E [ u ( t ) ,u( t' ) ]  =  Q  iff  t  =  t '  (11) 

E [ v ( t ) , v( t '  )  ]  =  R  iff  t  =  t'  (12) 

C.1  OLS 

The  OLS  regression  assumes  the  model : 

z(  t+ 1  /t )  =  £>•  z  ( t )  (13) 

and  choses  the  estimate  b  of  b  such  that  the  estimation 
cr i ter  ion 

r 

S  =  22  [  z(t+1)  -  £>»z(  t )  1 2  (14) 

tc  l 


is  minimized. 

It  is  easy  to  show40  that  the  value  of  b  that  minimizes 
this  criterion  is: 

fc>  =  ZT  [z(  t+1 )  -  z ( t )  ]  /  21  z ( t ) 2  (15) 

t~!  £  -  • 


40 See  section  (II-B.2) 
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the  well-known  Least  Squares  estimate. 

This  method  is  a  "one  shot"  method  in  that  it  assumes 
all  the  data  available  and  computes  b  by  one  pass  trough  the 
data,  since  there  is  an  analytic  expression  for  b. 

To  make  the  comparison  with  LIKALM  more  obvious,  we 
will  assume  here  another  interpretation  of  OLS. 

Assuming  that  it  were  impossible  to  find  a  closed  form 
solution  to  the  minimization  problem,  we  would  have  to  use 
an  iterative  convergence  algorithm  to  find  b.  Starting  with 
a  guessed  value  b0  ,  we  would  simulate  the  model: 

z(t+1/t)  =  bQ  .z(t)  ( 16) 

staring  at  the  first  data  point;  then  compute  the  criterion: 

T 

S(z .,£>„)  =  ZZ  [Z(t+1/tl  -  z(t  +  1)]2  (17) 

t  trr/ 

and  compute  the  gradient: 


dS 
d  b 


=  -2* 


£>=£>„ 


T 


7 


e ( t ) »z ( t ) 


(18) 


If  we  are  using  the  Newton- Raphson  algorithm  (for  example), 
we  would  also  compute  the  second  derivative: 


d 2  S 
6b2 


-2' 


ts  I 


z  ( t )  2 


b=b 


(  19) 
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We  would  then  choose  £>,  as: 


dS  d2S 

—  ) 
6b  dt)2 


b-b 

o 


(20) 


and  reiterate  the  whole  process  until  some  convergence 
cr i ter  ion  is  met . 

The  simulated  trajectory  for  one  iteration  on  b  is 
shown  in  Figure  I X - 1 .  The  trajectory  is  reinitialized  at 
each  data  point . 

This  works  well  when  there  is  no  measurement  noise. 
However,  even  small  measurement  errors  lead  to  inaccurate 
estimates.  This  is  best  explained  by  Peterson  [85]. 

"The  intuitive  reason  for  the  failure  is  as  follows. 
The  motivation  for  reinitializing  the  model  at  each 
data  point  was  to  Keep  the  simulation  close  to  the 
true  state  of  the  system,  so  that  any  divergence 
between  model  and  data,  as  measured  by  the 
residuals,  would  be  meaningful.  But  reinitialization 
at  noisy  data  is  unlikely  to  keep  the  simulated 
state  close  to  the  real  state.  Large  residuals  may 
result  from  even  a  perfect  model,  under  OLS." 

C . 2  LIKALM 

The  LIKALM  approach  differs  from  the  above  scheme  in 
two  points:  the  reinitialization  of  the  simulatiom  at  each 
data  point  and  the  estimation  criterion. 
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A 


actua 1  data  Z  ( t ) 


simulation  trajectories 


-I - 1 - 1 - 1 - 1 - 

1  2  3  4  5 


Time  t 


FIGURE  I X - 1  :  Reinitialization  at  each  point  in  time  in  OLS  simulation 
( from  [ 85 ]  ) 


Reinitialization  at  each  data  point 

For  each  data  point,  the  Kalman  Filter  (defined  in 
section  ( I X - E )  )  yields  the  "best"  estimate  of  the  true 
state  x ( t )  as  well  as  the  variance  of  that  estimate  Sx, 
using  all  the  data  available  up  to  and  including  time  t. 
Calling  this  estimate  x(t/t),  the  prediction  of  the  next 
measurement  point  z(t+1),  denoted  z(t+1/t),  is  then 
computed  as  in  Least  Squares  by: 

z(t+1/t)=6;.x(t/t)  (21) 

for  iteration  i  on  the  parameter  value. 

Thus,  at  each  step,  the  simulation  is  reinitialized  at 
the  "best"  point  (the  point  in  which  the  state  is  most 
likely  to  be,  according  to  the  Kalman  Filter). 

The  "best"  estimate  of  the  state  x(t/t)  at  time  t 
is  given  by: 

SK(t/t-1 ) 

x  ( t  / 1 )  =  z(t/t-1)  +  - •  [  z  ( t )  -  z(  t/t- 1  )  ]  (22) 

S  ( t  / 1  -  1  ) 


guessed 

measurement 


gain  or 
cor rect i on 


error  or 
i nnovat i on 


where  is  the  covariance  of  the  innovation. 


The  process  is  illustrated  in  Figure  I X - 2 . 

The  estimation  criterion 

The  Kalman  filter  yields  not  only  the  the  residual 
or  "innovation"  e(t)  =  z(t+1/t)  -  z(t+1)  at  each  point, 


, 
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FIGURE  IX-2:  Reinitialization  at  each  point  in  time  in  LIKALM  simulatiton. 
( from  [ 85 ] ) 
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but  also  the  covariance  matrix  of  that  innovation, 

S  (t+1/t)  . 

Z  T 
The  Least  Squares  criterion,  S  =  2  [ e ( t ) ] 2 ,  would  not 

t~> 

make  use  of  this  additional  information.  Interestingly, 
the  variance  of  the  white  noise  process  e(t),  noted 
S^( t ) ,  is  precisely  used  by  the  Maximum  Likelihood 
cr i ter i on : 


L  =  K 


r 


t  Z  I 


e2  ( t ) 

[  Log  |  S  |  + -  ] 

*  S 


(23) 


making  the  use  of  this  criterion  more  efficient  in  the 
light  of  the  information  available  from  the  Kalman 
Fi Iter. 

(The  derivation  of  this  criterion  is  left  to  section 

(IX-F)  .  ) 

A  simple  illustration  of  these  results  with  a  one 
dimensional  model  is  included  with  the  experimental 
resu 1 ts . ( Sect i on  (X-B.1)) 

D.  DERIVATION  OF  THE  LIKALM  ALGORITHM  FOR  A  ONE-DIMENSIONAL 
MODEL 

D.1  Statement  of  the  problem 
Consider  the  following  structure: 

x ( t+ 1 )  =  b«x ( t )  +  u ( t )  ( 24 ) 

z(  t+1 )  =  x(t+1  )  +  v(t+1 )  (25) 

with  b  unknown,  u ( t )  and  v(t)  zero  mean  gaussian  noise 
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sequences  with  Known  covariances: 


E{u( t ) ,u( t' ) }  =  s^2  =  q 

iff  t  =  t' 

(26) 

E{v(t),v(t/ )}  =  s^2  =  r 

iff  t  =  V 

(27) 

We  wish  to  analyze  the  performance  of  the  scheme,  for 
estimating  b,  that  uses  a  Kalman  Filter  to  estimate  x(T)  and 
Maximum  Likelihood,  given  { z ( 0 ) ,  z(1), . .  z ( T ) } 

D.2  Solution  of  the  problem 
1.  The  Likelihood  function 

Given  {z(0),  z ( 1 ) .  z ( T ) } ,  and  if  we  define  this 

set  of  independent  sample  values  up  to  time  T  as: 

Z(T)  =  {z(0),  z( 1 ) . . .  z ( T ) } ,  we  then  have  the 

following  equalities  for  the  Likelihood  function,  based 
on  the  assumption  of  normal  distributions. 

L { [z ( 0 ), z ( 1 z( T ) | b ] }  =  p { z ( 0 ) , z ( 1 ) . z ( T ) | b } 

=  p{z(0) |b}»p{z( 1 ) | z ( 0 ) , b } . »p{z(T) | Z ( T - 1 ) , b} 

=  p{z ( T ) | Z ( T -  1 ) ,b}  (28) 

so  that  L  is  now  proportional  to: 


L 


I  I  - exp 

iso  |  s 2  (  t  )  | 


[z( t )  -  z(  t ) ] 2 


2»s 


2 

Z_ 


}  (29) 


where : 

z(  t)  =  E{z(t) | Z ( t - 1 ) , b }  (30) 

s£(t)  =  E{[z(t)  -  z( t ) ] 2 | Z ( t- 1 ) , b)  (31 ) 

Taking  the  negative  logarithm  of  L,  we  wi 1 1  minimize  the 
cr i ter  ion : 
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r 


[z(t)  -  Z(t) ]2 


(32) 


wi  th  respect  to  b . 

2.  The  Kalman  Filter 

The  ideas  of  Kalman  Filtering  can  be  put  in  the 
following  simple  form. 

Supposing  b  Known,  we  wish  to  estimate  x(t+1)  from 
Z(t)  =  {z ( 0 ) , z ( 1 ) , . . . z ( t ) } .  To  do  this  recursively,  we 
suppose  that  x(t)  has  previously  been  estimated  from  Z ( t - 1  )  . 
Then  the  two  items  of  information  available  to  us  for  the 
estimation  of  x(t+1)  are: 

i)  The  "old"  estimate: 

{x( t ) | Z ( t -  1 ) }  =  E {  x ( t ) | Z ( t -  1 ) }  or  x(t/t-1)  (33) 

ii)  The  "innovation"41  that  is  observed  at  time  t: 

e(t)  =  z ( t )  -  {z(t) |Z( t-1 )}  =  z ( t )  -  ix(t) | Z ( t - 1 )} 

(34) 

The  last  equality  holds  because  E { v ( t ) }  =  0. 

It  can  be  shown  that  these  two  items  are  statistically 
independent,  using  the  conditional  expectation  theorem  and 
the  fact  that  they  are  Gaussian. 

Consider  now  the  two  zero-mean  Gaussian  variates  x  and 
z  defined  as: 


4  1 


See  p.  130 


x  =  x( t+1  )  -  E { x ( t +  1  )  |  Z ( t  -  1  )} 
z  =  e  ( t ) 
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(35) 

(36) 

If  z  were  not  oberved,  then  E{x}  =  E{x|Z(t-1 )}  would  be 
zero.  When  z  is  observed,  the  extent  to  which 
E { x | z }  =  E { x | Z ( t ) }  differs  from  zero  is  the  "correct  ion" 
that  must  be  added  to  E { x ( t+  1  )  | Z ( t - 1  ) }  to  obtain  x(t  +  1),  in 
the  ligth  of  the  new  information. 

To  obtain  an  expression  for  E { x | z } ,  let  us  consider  the 
joint  dens i ty : 


p  (  x  ,  z  ) 


1 


2  T7 


•  exp  [  A  ] 


(37) 


where : 

A  = 


2(fxz 

sx#s* 


]} 


(38) 


and  s^  and  s^  are  variances,  C  =  sX2/  (sx»s^)  is  the 
coefficient  of  correlation. 

The  conditional  density  is  then: 


p( x | z ) 


p( X ,z ) 
p  ( z ) 


exp[  B  ] 


(39) 


where : 


2(1  -  <j> 


2  <j>xz 
s  »s. 


s! 


B 


{ 


( 1  -  f  2) 


}  (40) 


J 
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where  the  quantity  in  braces  is  a  perfect  square 


{.}  =  ---  (x  -  <f  z) 

s. 


(41  ) 


Then : 

E { x | z }  =  ^  x»p(x|z)»dx 

=  J  [  x  -  •  (  sx /sz  )  «z  ]  »p  (  x  |  z  )  »dx 

+  ^  •(sx/s2)»z  •  J'  p(x|z)«dx 
=  f  •(sK/sz)»z  =  (  sxz  /  S2)«Z 

Thus  the  optimum  estimate  obeys: 

x(t+1)  =  E {  x(t+1)  |  Z(t)  } 

=  E  {  x( t+1  )  |  Z ( t - 1  )  }  +  (s  xz  /  s2)*e(t) 


where : 


=  b»x( t)  +  ( s  /  s 2 ) »e ( t 

*  z  z 


s2  =  E{  e2(t)  }  =  E{  [  x(  t )  -  x(t)  +  v( t ) ] 2 } 

=  s 2  ( t )  +  r 
y 


and 


(42) 


(43) 


(44) 


sxz  =  E{[x(t+1)-  E[x(t+1)|Z(t-1)]]*[x(t)  -  x(t)  +  v ( t ) ] } 


=  E{[  b*x(t)  +  u(t)  -  b»x(t)]»[  x(t)  -  x(t)  +  v ( t ) ] } 

(45) 

Thus  the  Kalman  Filter  recursive  equation  is  in  this  simple 


=  b»s 2 ( t ) 
x 


case : 


sJ(t 


s2  ( t )  +  r 


x( t+ 1 )  =  b»x( t )  +  b» 


[  z( t )  -  x( t ) ] 


(46) 
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We  also  need  a  similar  recursive  equation  for  s2(t+1),  which 
is  given  by: 


s2(t+1)  =  E {  [x(t+1)  -  x(t+1)]2  } 

=  E{[b»x(t)  +  u(t)  -  b»x(t)  -  ( s  /s 2 ) *e ( t ) ] 2 } 

-  b2»s2(t)  +  q  -  (s2/s4)»s2 

X  ^  XZ  z  z 

=  b2»s2(t)  +  q  -  b2*[  sM  t)/  (s2  +  r  )]  (47) 


or  : 


s 2 ( t ) »r 


s2 ( t+1  )  =  b2» 


(48) 


s 2  (  t )  +  r 
x 


It  can  be  easily  shown  that  the  solutions  of  the  variance 
equation  are  uniformly  bounded  for  all  t  and  that  the  "loop 
gain"  of  the  filter  equation  is  less  than  one  if  the  initial 
model  is  stable:  therefore,  the  Kalman  Filter  is  internally 
stable . 

3.  Minimization  of  the  Likelihood  function 

Now,  to  minimize  the  negative  Likelihood: 


r 


[z( t )  -  z( t ) ] 2 


Log  L  = 


{  Log  | sz ( t )  |  + 


2«s|( t ) 


(49) 


with  respect  to  b,  we  set: 


d 


{  -Log  L  }  =  0 


d  b 


(50) 


. 
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which  yields: 


x 


1  d  s  (t)  [z(t)  -  z(  t )  ]  dz(  t ) 

- .  -  .  2  -  . - 

s  ( t )  d b  ds 2 ( t )  db 

*■  z 


[z(  t )  -  z ( t )  ] 2  ds_(  t ) 

.  2  -  *  -  ]  =  0  (51  ) 

2  s3(t)  db 


or  : 


T 


fc  =  o 


1  [z( t ) -z( t ) ] 2  ds_ ( t ) 

— (1 - 

s  ( t )  s 2  ( t )  db 

•z  2L 


[ z ( t )  -  z ( t ) ]  dz ( t ) 

-  .  -  ]  .  o 

s 2  ( t )  db 


(52) 


Let  us  define: 


[  z( t )  -  z( t )  ] 2 

c ( t )  =  1  -  - .  (53) 

s2(t) 


Because  of  the  asymptotic  stability  of  the  Kalman  Filter 
equations,  as  T  becomes  large,  sz(t)  and  (dsz(t)/db)  will 
approach  constant  values  (which  depend  on  b)  while  the 
multipliers  c(t)  constitute  a  zero  mean  sequence,  hence  the 
relative  contribution  of  the  term  in  (  ds_,(t)/db  )  wi  1  1 
vanish  as  T  tends  to  infinity.  Thus  it  suffices  to  set: 


[  z(t)  -  Z(t)  ] 

s 2  (  t  ) 


dz(  t ) 

•  -  =  0 

db 
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(54) 


D.3  Unbiasedness  of  the  LIKALM  estimate 

The  precise  behavior  of  the  solutions  b  of  equation 
(54)  will  depe  sample  sequences  Z(T),  but  it  is  possible  to 
show  that  if  the  iterative  scheme  illustrated  in  Figure  I X - 3 
converges  to  some  ,  this  estimate  will  be  unbiased,  that 

i  s : 


El  f  |im  1  =  b 


This  can  be  established  by  writing  (for  the  iterative 

scheme,  z(t)  and  s 2 ( t )  are  fixed): 

z 


T 

(z(t)-z(t)]  dz{ t ) 
-  • - 

fc=0  s 2  <  t )  db 

z 


[ Z ( t )  -  Z(t) ] 

t-o  s|(  t  ) 


(55) 


s 2  ( t ) 

{  z ( t ■ 1 )  +  - -  [  z(t-1 )  -  z ( t - 1 )  ]}  (56) 

s£ ( t )  +  r 


which  has  the  expected  value  zero  if,  and  only  if: 

E {  [ z ( t )  -  z(t)]*z(t-1)  }  =  0  (57) 

and : 

E {  [ z ( t )  -  z(  t )  Mz(  t-1  )  -  z(  t- 1  )  }  =  0  (58) 

But  these  conditions  hold  when  the  z(t)  are  generated  by  a 
Kalman  Filter  with  b  =  b,  because  then,  and  only  then,  are 
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MEASUREMENT  NOISE 


FIGURE  I  X - 3 : 


Iterative  scheme 
( f  rom  [ 24 ]  ) 


for  the  LIKALM  algorithm. 
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the  serial  correlations  zero42. 

This  derivation  and  the  properties  of  the  Filtering 
Form  of  the  Maximum  Likelihood  algorithm  have  of  course  been 
generalized  to  the  multidimensional  case.  The  next  two 
sections  present  briefly  the  Kalman  Filter  and  the 
derivation  of  the  LIKALM  algorithm  for  the  multidimensional 
case . 

E.  THE  KALMAN  FILTER 

The  Kalman  filter  is  one  of  the  most  important 
contributions  to  the  field  of  control  theory  in  the  past  two 
decades.  Since  its  original  derivation  by  Kalman  [21],  it 
has  been  rederived  from  many  different  points  of  view  (see 
e.g.  [68], [104])  and  has  been  widely  used  in  stochastic 
control  (see  e.g.  [105],  [106],  [107])  and  particularly  in 
aerospace  applications  such  as  guidance  and  control.  In  its 
straightforward  application,  the  Kalman  filter  involves  the 
optimal  estimation  of  the  state  variables  of  a  system  when 
the  measurements  are  corrupted  with  white  noise. 

A  brief  summary  of  the  Kalman  filter,  following  Athans 
[27],  is  now  presented. 

E.1  Statement  of  the  problem 

Given  a  system  in  state-vector  form: 
x ( t )  =  F ( t -  1 ) «x ( t- 1 )  +  B ( t -  1 ) • u ( t -  1 )  +  G(t-1)«w(t-1)  (59) 

where : 


4 2 See  comment  on  items  i  and  ii  on  page  120 
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x(t)  is  the  n-dimensiona 1  state  vector 

uit)  is  an  r-dimensiona 1  input  vector 

wit)  is  a  p-dimensiona 1  system  noise  vector 

Fit),  B(t),  Git)  are  Known  matrices  of  appropriate 

order , 

suppose  the  following  measurement  equation  holds: 

z( t )  =  Hit) *x i t )  +  v( t ) 

where : 

z(t)  is  an  m-dimensiona 1  output  vector 

vit)  i s  an  m-dimensional  vector  of  mesurement  noise 

Hit)  i s  a  Known  matrix, 

and  where  the  following  assumptions  hold: 

1.  x(0)  is  gauss i an  with  Known  mean  x(0)  and  Known 
covariance  matrix  S(0). 

2.  wit)  is  gaussian  with: 

E [ w( t ) ]  =  0  for  all  t 
Cov[w( t ) , w( t' ) 1  =  Qit)  for  t  =  t'  only 
Q ( t )  is  Known  positive  definite. 

3.  vit)  is  gaussian  with: 

E [v ( t ) ]  =  0  for  all  t 
Covlv(t) ,v( t#  )]  =  R  for  t  =  t' 

R(t)  is  Known  positive  definite 

4.  x(0),  wit)  and  v(t)  are  mutually  independent  for  all 
va 1 ues  of  t . 

5.  uit)  is  a  deterministic  time  sequence. 

6.  Fit),  Bit),  Git)  and  Hit)  are  all  der termi ni st i c . 


(60) 


’ 
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It  is  desired  to  construct  a  "best"  estimate  of  the  state 
vector  x(t)  given  past  values  of  the  input  vector: 

U ( t -  1  )  =  [u(  0 ) , u( 1 ) . u ( t -  1  )  ] 

and  past  values  of  the  measurement  vector: 

Z(t)  =  [z(1),z(2),....,z(t)j 

The  best  estimate  is  denoted  by  x(t/t)  and  is  defined  as  the 
conditional  mean  of  x(t)  given  U ( t - 1 )  and  Z(t),  i.e.: 

x(t/t)  =  E[x(t)/Z(t) , U ( t -  1  )  ]  . 

E.2  Solution  of  the  problem 

The  linearity  of  equations  (59)  and  (60),  together  with 
the  gaussian  assumptions  implies  that  the  conditional 
probability  density: 

P [ x ( t)/Z(t)  ,U( t-1  )  ]  (61  ) 

is  also  gaussian,  and  hence,  it  is  uniquely  character i zed  by 
its  conditional  mean  x(t/t)  and  conditional  covariance 
matrix  S( t/t ) . 

Kalman  [21]  derived  a  powerful  sequential  algorithm  to 
generate  x(t/t)  and  S(t/t)  using  the  properties  of 
orthogonal  projections  (  i.e.  the  error  in  the  estimation  of 
the  state,  e  =  x  -  x,  is  minimal  if  x  is  the  orthogonal 
projection  of  x  into  the  space  of  "approximations")43.  We 
will  only  state  the  general  result  here. 

1.  The  calculation  of  the  conditional  mean  x(t/t)  is  done 

43  The  alternative  derivation  illustated  in  section  D-2  for 
a  one-dimensional  model  can  also  be  generalized 
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on-line  using: 

■  initial  condition: 

x( 0/0  )  =  x ( 0 )  (62) 

■  mean  prediction: 

x ( t / 1 -  1  )  =  F(  t-1  )»x(  t -  1  / 1 -  1 )  +  B(  t-1  ).»u(  t-1 )  (63) 

■  residual  calculation: 

e ( t )  =  z ( t )  -  H( t)«x( t/t-1 )  (64) 

■  mean  update  equation: 

x  ( t / 1 )  =  x ( t / 1 )  +  K ( t ) »e ( t )  (65) 

where  K(t)  can  be  computed  from: 

2.  The  calculation  of  the  conditinal  covariance  matrix  that 
can  be  done  off-line  using: 

■  initial  value: 

S ( 0/0 )  =  S(0)  (66) 

■  covariance  prediction  equation: 

S ( t/t-  1  )  =  F( t-1 )*S(t-1/t-1 )*F'  (t-1) 

+  G(t-1)«Q(t-1) •G' (t-1)  (67) 

■  covariance  update  equation: 

S( t/t )  =  S( t/t- 1 )  -  S( t/t- 1 ) •H'  (t)» 

[ H ( t ) «S( t/t- 1 ) •H/  ( t)  +  R ( t )  ]  -  1  • 

H(t)*S(t/t-1 )  (68) 

■  filter  gain  calculation: 

K  ( t )  =  S  ( t  / 1 )  •FT  (t).R-Mt)  (69) 

The  filter  generates  new  state  estimates  as  new 
observations  become  available,  thus  opening  the 
possibility  of  real-time  estimation.  As  an  important  and 
useful  by-product,  the  filter  generates  the  estimation 
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error  covariance  matrix,  which  measures  the  uncertainty 
in  the  estimate. 

E.3  The  Extended  Kalman  filter 

The  Kalman  filter  described  above  provides  a  very 
useful  method  of  estimation  of  the  states  of  linear  systems. 
However  the  desire  to  use  a  method  similar  to  the  Kalman 
filter  for  nonlinear  models  has  led  to  the  formulation  of 
the  "extended"  Kalman  filter  which  is  then  only  a  suboptimal 
estimator  (in  the  sense  that  an  approximation  is  required  to 
derive  the  estimator  equations). 

A  nonlinear  system  (without  external  input  and  with 
additive  noise,  to  simplify)  can  be  written  as: 

x ( t )  =  f (x( t-  1  )  ,  t-1  )  +  G ( t -  1 ) »w( t- 1 )  (70) 

z( t )  =  h( x ( t ) , t )  +  v( t )  (71) 

where  x,  z,  w,  v,  G  are  defined  as  previously  and: 

jF  (  x  ( t  -  1  )  ,  t  -  1  )  is  an  n-dimensional  nonlinear  vector 
function  of  the  state  x(t)  at  time  t 

h(x(t),t)  is  an  m-dimensional  nonlinear  vector  function 
of  the  state  at  time  t. 

all  other  assumptions  remaining  the  same  as  in  section 
( IX-E.  1  )  . 

Since  the  solution  becomes  quickly  computationally 
infeasible,  even  for  very  simple  nonlinear  systems,  an 
"ad-hoc"  approach  is  usually  used  to  obtain  practical 
equations . 

Assuming  that  the  "best"  estimate  of  the  state 
x ( t -  1  / 1 -  1  )  exists  at  time  t-1,  the  idea  is  to  expand 
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f  (x(  t-1  )  ,  t -  1  )  about  x(  t-1 /t-1)  and  h(x(t),t)  about  x(t/t-1) 
in  a  Taylor  series  expansion  and  to  retain  only  the  first 
order  terms. 

As  a  result,  the  extended  Kalman  Filter  equations  can  then 


be  writ  ten : 

x(t/t-1)  =  f (x(t-1/t-1 ) , t-1 )  (72) 

x(  t/t)  =  x ( t / 1 -  1 )  +  K ( t ) ) *e ( t )  (73) 

e(t)  =  y(t)  -  h(x( t/t- 1 ) , t-1 )  (74) 

where  K(t)  is  calculated  from  the  normal  Kalman  Filter 
equations  with  the  matrices  H(t)  and  F(t)  defined  by: 

F(t)  =  df/dx  for  x  =  x(t/t)  (75) 

H ( t )  =  dh/dx  for  x  =  x(t/t-1)  (76) 


More  advanced  filters  such  as  Second  Order  Filters, 
Single  Stage  Smoothing  Filters,  etc  [108]  can  also  be 
derived  but  this  unnecesser i ly  complicates  an  already 
complex  situation  and  does  probably  not  help  much  given  the 
numerous  uncertainties  in  the  parameter  estimation  problem. 
More  details  on  nonlinear  filtering  can  be  found  in  [68], 
[109],  [110]. 

E.4  A  word  on  innovations 

The  concept  of  "innovations"  was  introduced  by 
Kailath  [111].  In  the  equations  of  the  Kalman  Filter,  the 
quant i ty : 

e(t)  =  z ( t )  -  z ( t / 1 -  1 )  (77) 

is  the  new  information  brought  about  by  the  measurement 

z(t).  The  sequence  e(1), . ,e(t)  is  Known  as  the 

"innovations  sequence",  and  has  been  shown  by  Kailath  [111] 
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to  be  zero  mean,  gauss i an  and  white,  and  have  the  same 
statistics  as  the  measurement  noise  process.  These 
innovations  represent  the  difference  between  the  new 
measurement  z(t)  and  the  prediction  based  on  all  the 
previous  data  z(t/t-1).  Thus  the  terms  e(t)  contain  the 
information  contributed  by  the  z(t)  so  that  e(t)  and  z(t) 
can  be  used  interchangeably.  One  interest  of  introducing 
this  concept  is  that  the  fact  that  the  innovations  sequence 
is  gaussian  allows  us  to  formulate  the  Likelihood  function 
in  a  simple  way. 

F.  DERIVATION  OF  THE  FILTERING  FORM  OF  THE  ML  ESTIMATOR 

This  section  follows  closely  Mehra  [84].  It  is  assumed  that 
he  stucture  of  the  model  is  known.  The  vector  of  unknown 
parameters  in  F  ,  B,  G,  H,  Q  and  R  and  x(0)  is  denoted  by  b. 

It  is  assumed  here  that  b  is  identifiable  from  a 

sequence  of  observations  z( 1 ) . z(T)  made  on  the  system 

with  noise  and  a  sequence  of  known  inputs. 

Let  Z(T)  =  {z( 1 ) . z(T)}  be  our  sample  of  outputs. 

The  Likelihood  function  is  then  L  =  P(Z(T)/b)  and  the 
maximum  likelihood  estimate  is  given  by: 

b  =arg  {max/b  P(Z(T)/b)}  (78) 

Using  Bayes'  Rule: 

P(Z(T)/b)  =  P { z ( 1 ) . . .z(T)/b} 

=  P{z(T)/Z(T-1 ) ;b}*P{Z(T-1 )/b} 

With  successive  applications  of  this  rule,  we  get: 


(79) 
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P{Z(T)/b}=  1|  P{z(t)/Z(t-1);b}  (80) 

L-\ 

Since  the  Logarithm  is  a  monotonic  function,  it  is  more 
convenient  to  write  the  Maximum  Likelihood  estimate  as: 

b  =  arg{  max/b  [Log  P(Z(T)/b)]}  (81) 

T 

=  arg  {  max/b  y*  Log  P [z( t ) /Z ( t- 1 ) ] , b) }  (82) 

tsi 

If  x(0),  w(t),  and  v(t)  are  normally  distributed, 

P (z( t ) /Z ( t -  1 ) , b )  will  also  be  normal  and  can  be  uniquely 
determined  by  computing  the  mean  and  covariance. 

Therefore,  let  us  define: 

E{z( t ) / Z ( t -  1 ) , b }  =  z(t/t-1 )  (83) 

and 

Cov{z ( t ) / Z ( t -  1 ) , b}  =  E{[z(t)-z( t/t-1 ) Mz(t)-z( t/t-1 ) ]'  } 

=  Sz(t/t-1 )  (84) 

For  a  normally  distributed  K-vector  z,  we  have: 

P(z)=  (  1  /  [  (  27])/2»  |  |  Vz)  1  *exp[  (  -  1  /2  )  »e'  «SZ  '  1  »e]  (85) 

where  e  =  z  -  E(z)  and  |  |  is  the  determinant. 

Then : 

Log  P(z)=  constant  - ( 1/2 )»Log| Sz|  -  ( 1 /2 ) • ( e'  S^~ 1 e )  (86) 

Therefore,  our  Log- 1 i ke 1 i hood  function  becomes: 

T~ 

L  =  constants  -  ( 1 / 2 ) •  2Z  {Log  | S  (t/t-1  )  | 

+  [ ( z ( t )  -  z(t/t-1 ) ]' -s^"1 (t/t-1  ) 

•[z( t)  -  z( t/t-1 ) ]} 


(87) 
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The  problem  of  determining  the  Likelihood  function 
has  now  become  one  of  finding  a  way  of  calculatig  the 
conditional  mean  z(  t/t-1)  and  the  covariance  matrix 
Sz(t/t-1). 

These  quantities  are  precisely  the  output  of  a 
Kalman  Filter  given  by  (in  a  slightly  different  form 


than  previously  so  that  Sz  may  also  appear: 

Before  measurement : 

x(t/t-1)  =  f [x( t- 1 /t- 1 ) , t- 1 ]  +  B •u ( t -  1 / 1 -  1 )  (88) 

S^t/t-1)  =  F«S*(t-1/t-1  )*F'  +  GQG'  (89) 

z(  t/t-  1  )  =  h[x( t/t- 1 ) , t ]  ( 90 ) 

S, (t/t-1)  =  R  +  H«S( t/t-1 ) •H'  (91) 

At  time  t  we  have  the  measurement  z(t) 

Then  the  innovation  is: 

e(t)  =  z(t)  -  z(t/t-1)  (92) 

and  the  gain: 

K ( t )  =  Sx(t/t-1 )*H' «S  - 1  (t/t-1  )  (93) 

And  the  update  equations  are:44 

x(t/t)  =  x( t/t- 1 )  +  K(t)»e(t)  (94) 

Sx(t/t)  =  [I  -  K(t)»H]*Sx( t/t-1 )  (95) 

with:  S x( 0/ 0 )  =  Sx(0)  and  x(0/0)  =  x ( 0 ) 


The  Maximum  Likelihood  estimate  is  now  obtained  by 
maximizing  (87)  with  respect  to  b  subject  to  the  constraints 
in  (88)  to  (91).  This  a  very  difficult  optimization 


44Equ.  (95)  is  an  alternative  formulation  of  equation  (68). 
It  follows  from  the  matrix  inversion  lemma: 

If  A  =  B  -  BH'  ( HBFT  +  R ) " 1 HB 
Then  A-1  =  B"1  +  H,R"1H 
See  e .g .  [104,  p .  14] 
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problem. . . 

In  the  case  of  linear  systems,  an  approximation 
suggested  by  Eaton  [99]  simplifies  the  problem  considerably. 
The  gain  K  and  the  covariance  matrix  Sz  are  assumed  to  have 
attained  constant  values.  The  maximization  of  the  loss 
function  (87)  can  then  be  done  analytically  and  in  two 
steps : 

Maximizing  (87)  with  respect  to  Sz  yields: 

r 

sz  =  (1/T).^>  e  ( t /£> )  »e'  ( t /£>)  (96) 


where  b  is  given  by  the  root  of  the  equation: 


5 


de  ( t ) 

e#  ( t )  •  S  " 1  • - 

z  db 


0 


(97) 


which  is  found  by  numerical  minimizaton. 

In  the  nonlinear  case  however,  the  usual  approach  is  to 
treat  the  problem  as  a  nonlinear  programming  problem  in 
order  to  find  an  iterative  solution  to  the  parameter 
est imates . 

Maximizing  (87)  is  equivalent  to  minimizing  the 
negative  Likelihood  fuction  J [b)  with  respect  to  b. 

The  nonlinear  programming  algorithm  is  usually: 

b{  i  +  1 )  =  b( i )  -  R; »gj  ( 98 ) 


where : 


. 

. 
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=  dd/dd  for  b  =  bi  (99) 

R;  is  an  approximation  to  the  second  partial  matrix, 
taken,  in  the  Gauss-Newton  method  as  the  inverse  of  the 
information  matrix  M;. 

M  j  =  E [ d 2  d / db 2  ]  for  b  =  bj  (100) 

F.1  Expressions  of  the  gradient  and  the  information  matrix 
Since  the  expression  of  J  is: 


J  =  -1/2  {e'  »SZ  ( t/t-1  )»e  +  Log|Sz<  t/t-1  )  |  }  (101) 

te  I 


differentiating  with  respect  to  b,  we  obtain  the  k-th 
component  of  the  gradient  as: 


dd 

db 


ie‘  *Sz 


de 

i  • —  - 
dbk 


1 

2 


dS2 

•---•s 

dbk 


1  »e ) 


1  d  S, 

+  ---  tr(S  (102) 

2  2  db^ 


where  the  time  dependencies,  e(t)  and  S^(t/t-1),  have  been 
neglected  for  the  sake  of  clarity;  and  similarly  the  (k,l) 
element  of  the  information  matrix  is  given  by: 


d2  d 

dbj<*db  j 


de  de  d  S  de 

{  —  •  S^- 1  • - e#  . 

db^  d  by  dbj<  db-j 


dS.  de  1  dS  dS 

-  e' *S  - 1 ---  -  — • t r [ S  _ 1 • - *S  "  1  • - ] }  (103) 

'  2  db7  *  dbk  2  *  db)  dbk 


. 
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neglecting  the  following  2nd  order  partials  and  first  order 
partials  of  inverse  matrices: 

+  (1/2)  tr  [  S  _  1 • (d2S  /db.  *db-, )  ] 

+  e'  •S^' 1  •  (de/db^db'j  ) 

-  (1/2)  e'  »S  " 1  •  (dS  /db  )  »S  ~ 1  •  (dS_  /db,  )»5__1»e 

-  (1/2)  e' •S^- 1 • (d2Sz/db^*db^ ) •S^- 1 »e 

-  (1/2)  e' »S  " 1 • (dS  /db.  ) »S  ~ 1 • (dS  /db.)»e 

Z  2  K  Z  Z  1  ~ 

In  [112],  Mehra  proposes  another  approximation  for 
the  informatin  matrix  which  turns  out  to  be  better  in 
pract i ce . 

The  above  information  matrix  was  obtained  by  neglecting 
a  few  second  order  terms  in  the  expression: 

M  =  d2J/db2  and  will  be  denoted  M, . 

The  second  form,  that  will  be  denoted  M2  here,  is  defined 

by: 

=  E {d2J/db2 } 

=  E{ (dJ/db)*(dJ/db)' }  (104) 

and  is  estimated  from  the  sample  as: 

r 

V"  de  de 

M  (K,  1  )  =  /  {  (  ---  )'  •S,-1*  --- 

db  ( K )  z  db(  1  ) 

1  dS2  dS 

+  -  -  tr  [  S  ~ 1  • - •  S  • 1  • -  ] 

2  db(k)  db(l) 

1  dS7  dS, 

+  --  tr  {S'1* - ]  •  t  r  [S’1* - ]}  (105) 

4  db(k)  db( 1 ) 


. 

* 

. 
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This  second  version  has  the  advantage  of  being  always 
nonnegative  definite  so  that  one  does  not  run  into 
computational  problems  (such  as  negative  diagonal  elements) 
as  do  arise  in  the  case  of  M,  . 

The  flow  chart  for  the  Filtering  Form  of  the  Maximum 
Likelihood  algorithm  is  then  as  illustrated  in  Figure  IX-4. 

F . 2  Recurs i ve  equa t i ons 

The  gradient  and  the  Information  matrix  involve  the 
terms  de(t)/db(k)  and  dS2( t/t- 1 )/db(k) .  The  set  of  recursive 
equations  allowing  to  obtain  these  terms  from  a  set  of 
initial  (zero)  values  are  obtained  by  differentiating  the 
Kalman  Filter  equations  (88)  to  (95)  and  are  called  the 
"sensitivity  equations"  of  the  model.  They  can  be  obtained 
in  a  very  s t ra i ght forward  manner  from  the  Kalman  Filter 
equations  as  follows: 

Predict  ion : 

dx( t+1/t )/db(k)  =  df (x(t/t) ,t)/db(k) 

+  F ( t ) »dx( t / t ) /db ( k )  (106) 

dSx(t+1/t)  =  dF(t)/db(k)«Sx(t/t)»F' (t) 

+  F ( t ) *dSx  ( t/t ) Id-  b ( k ) • F ' (t) 

+  F ( t ) »S*( t/t ) •df  (t)/db(k) 

+  dG  ( t ) /db ( k ) »Q ( t ) •G' (t) 

+  G( t )*dQ(  t )/db(k)*G' ( t ) 

+  G ( t ) • 0 ( t ) »dG ( t ) /db ( k )  ( 107 ) 


yielding: 


. 


- 
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INPUT-OUTPUT 
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DATA,  A  PRIORI 
PARAMETERS 


BOUNDS 


FIGURE  I  X  - 4 : 


Flow-chart  for  the  LIAKLM  algorithm. 
( f  rom  [ 24  ]  ) 
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d S  (t+1/t)/db(k)  =  [dH( t  +  1 )/db(k) ]»S,( t  +  1/t )»H'  ( t  +  1 ) 

+  H( t+1 )»[dSK( t+1/t )/db(k) ]»H' ( t+1 ) 

+  H(t+1 )  •  Sx  ( t  + 1  / 1 )  •  [ dH'  ( t+1 )/db(k) ] 

+  dR( t+1 ) /db ( K )  ( 108) 

Update: 

de(t+1)/db(k)  =  -  H ( t+ 1 ) • [dx( t+ 1 / t ) /db ( K ) ) 

-  [dh(x( t+1/t )/db(K) )  (109) 

dK (t+1 )/db(K)  =  |dSx( t+1/t)/oto(k) ]»H'  ( t  +  1 ( t  +  1/t) 

+  Sx ( t  +  1/t ) • t dH'  (t+1 )/db(k) )»St-’(t+1/t) 

-  K(  t+1  )»[dSj.(  t  +  1/t  )/db(k)  1  ( t+1/t )  (110) 

dx( t+1 /t+1 )/db(k)  =  [dx( t+ 1 / t ) /db(k ) ] 

+  [dK( t+1 ) /db(k) 1 »e( t+1 ) 

-  K ( t+ 1 ) • [ de (t+1 )/db(k) ]  (ill) 

dS*( t+l/t+i )/db(k)  =  [dSx( t+i/t )/db(k) ] 

-  [dK( t+i ) /db(k) ]»H( t+i )»S*( t+i/t  > 

-  K( t+1 ) • [ dH ( t+1 )/db(k) )»SX(  t  +  1/t ) 

-  K(t+1)»H(t+1 )• [dSx ( t+1 /t ) /db(k) ]  (112) 

with  dSx( 0/0 )  =  0  and  dx(0/0)  =  0. 


. 


X.  ESTIMATION  WITH  THE  FILTERING  FORM  OF  THE  MAXIMUM 

LIKELIHOOD 

A.  THE  LIKALM  COMPUTER  PROGRAM 

A  computer  program  (  LIKALM  )  using  the  Filtering  Form 
of  the  Maximum  Likelihood  algorithm  was  written  to  estimate 
the  model,  since  no  such  routine  was  available. 

A . 1  Main  features  of  the  LIKALM  program 

LIKALM  is  a  general  program  for  estimating  any  linear 
or  nonlinear  model  written  in  state-space  form  (see 
equations  in  section  ( X - A . 2 ) ) . 

■  A  calling  program  reads  in  the  options  (explained  later) 
and  calls  the  appropriate  subroutines. 

■  A  subroutine  FLET,  called  only  if  the  minimization  is  done 
using  the  F 1  etcher -Powel 1  algorithm  instead  of  the 
information  matrix.  It  reads  in  the  initial  values  and 
parameters  required  by  this  procedure  and  calls  subroutine 
FMFP  (  which  in  turn  calls  FUNCT  as  an  external  function), 
(see  option  1  below). 

■  A  subroutine  FUNCT  sets  the  fixed  maximum  dimensions  of 
all  arrays  and  calls  the  main  subroutine  FUNCT 1 .  The  maximum 
dimensions  are  presently  set  as  follows: 

-  dimension  of  the  model  (N):  10  state  equations  (or 
level  variables) 

-  number  of  parameters  (L):  40  parameters  estimated  at 
one  time  (this  includes  parameters  of  the  various 
covariance  matrices  involved,  if  they  are  estimated) 
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Two  other  dimensions  provide  additional  flexibility:  these 
are  the  dimension  of  the  equation  noise  vector  ( M , 
maximum  =  10)  and  the  dimension  of  the  output  vector  (or 
altenatively  of  the  measurement  noise  vector)45 
(NZ,  maximum  =  10).  In  other  words,  not  all  the  equations 
need  contain  noise  and  not  all  the  states  need  to  be 
measured . 

■  Subroutine  FUNCT1  is  the  main  program:  it  reads  in  all  the 
dimensions  and  initial  values  from  the  input  file  (in 
variable  format,  the  format  for  each  input  being  provided  by 
the  user)  and  prints  them  out  if  required,  reads  in  the  data 
points  from  the  data  file  (also  in  variable  format),  calls 
the  Kalman  Filter  equations,  the  sensitivity  equations,  the 
computation  of  the  gradient  and  the  information  matrix,  the 
computation  of  the  step  size  and  checks  for  convergence4 6 

■  Subroutines  KALM,  DIF,  INFO,  and  STEP  perform  the 
computations  called  for  by  FUNCT 1 . 

■  Several  algorithm  and  output  options  are  provided: 

1.  An  alternative  minimization  method  is  provided:  this  is 
the  Fletcher  Powell  method  [65].  The  main  difference 
between  this  method  and  the  algorithm  presented  in 
section  (  I X - F )  is  that  this  method  avoids  the 
computation  of  the  second  order  derivatives  and  performs 
a  line  search  in  the  direction  of  steepest  descent  to 
determine  the  step  size. 

4  5 A 1 1  measured  outputs  are  assumed  to  be  noisy 
4 6  No  convergence  check  is  yet  provided:  the  program  is 
stopped  manually  by  the  user.  Setting  a  time  limit  on  the 
run  is  a  good  idea 
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The  public  program  FMFP  from  the  IBM  SSP  library  [113] 
performs  this  minimization.  Subroutine  FMFP  requires  the 
values  of  the  function  F  to  be  minimized,  and  of  its 
gradient,  at  different  points.  F  and  dF/dx  are  computed 
as  external  functions.  In  our  case,  F  is  the  Likelihood 
function  and  dF/dx  its  gradient.  Both  are  computed  in 
subroutine  INFO  as  in  the  normal  algorithm.47 

2.  The  program  can  be  run  as  an  extended  Kalman  Filter 
only;  in  this  case,  the  time  t,  and  the  values  of  the 
estimate  of  the  state  vector  and  its  covariance  matrix 
are  printed. 

3.  A  Least  Squares  subroutine  OLSQ  that  operates  on  the 
values  of  the  estimated  state  vector  yielded  by  the 
Kalman  filter  is  also  provided  if  we  wish  to  perform  the 
iteration  on  the  parameter  vector  using  successive  OLS 
estimates  rather  than  the  Maximum  likelihood  method48. 

4.  The  covariance  matrices  of  the  equation  noise  Q  and  the 
measurement  noise  R  can  be  specified  to  be  either  both 
fixed  or  estimated  or  only  one  of  them  can  be  estimated 
by  choosing  the  appropiate  option. 

5.  Two  output  options  allow  the  user  to  print  out  the 
headings  (includes  all  the  initial  values)  and  the 
information  matrix,  its  inverse  the  gradient  vector  and 


4 7 The  method  does  not  seem  to  work  properly  however  because 
the  line  search  values  affect  the  parameter  vector  so 
drastically  that  many  of  the  matrices  end  up  being  ill 
condi t ioned 

48This  method  is  not  guaranteed  to  converge,  but  was  tried 
as  a  simplification  to  LIKALM.  The  results  were  not 
convincing  and  are  not  presented  here 
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the  step  vector.  (The  gradient  and  the  step  are  always 
printed  out  at  each  iteration  but  in  formats  that  can 
easily  be  subject  to  overflow  if  any  problem  arises) 

■  In  addition  to  the  input  file  and  the  data  file,  the 
user  has  two  write  two  FORTRAN  subroutines  STATE  and  DSTATE 
specifying  the  state  equations  and  the  matrix  F  (see  section 
(IX-E.3)),  and  the  derivatives  of  the  state  equations  and  F 
with  respect  to  the  parameter  vector.  If  the  Least  Squares 
option  is  used,  an  OLS  subroutine  specifying  the  form  of  the 
estimated  equations  is  also  required. 

A. 2  The  FORTRAN  equations 

The  program  assumes  a  data-generat i ng  state-space 
nonlinear  model  of  the  form49 


x(t+1) 

=  FUN [ x ( t )  ]  +  G«u(t) 

(  1  ) 

z(t  +  1 ) 

=  H«x( t  +  1  )  +  v( t  +  1 ) 

(2) 

wi  th : 

Q(t)  =  cova{u(t)}  (MxM) 

R(t)  =  cova{v(t)}  (NZxNZ) 

SEO  =  cova{x(0)}  (NxN) 

G:  ( NxM ) 

H:  ( NZxN ) 
x:  ( Nx 1 ) 
z:  ( NZx 1 ) 

u:  (Mxl)  equation  noise 
v:  ( NZx 1 )  measurement  noise 

49No  external  input  is  provided  since  System  dynamic  models 
usually  generate  all  dynamics  within  the  boundaries  of  the 
system  (see  section  (I-A.2)).  This  however  could  easily  be 
added . 


?  •  ■  9 
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If  we  linearize  the  non-linear  model  about  he 
nominal  trajectory  xnom(t)  =  x(t/t)  and 
x  non)  ( t +  1  )  =  x(  t+1/t )  (Taylor  expansion  limited  to  the 
first  order  terms),  the  equations  of  the  Kalman  Filter 


are 


5  0 


if  F(x(t/t)  =  cfFUN  { x  ( t ) }  /dx  at  x  =  x(t/t) 
and  adopting  the  conventions: 

A  =  parameter  vector  b  ( L X  1  ) 5 1 


x(t+1/t)  =  XP 
x ( t / 1 )  =  XE 
Sx( t+1/t )  =  SP 
S*(t/t)  =  SE 
F[x( t/t) ]  =  F 
Sz(t+1/t)  =  sz 
e( t+1  )  =  NU52 
K( t+1  )  =  KA53 
z(t+1)  =  Z 
x ( t+1/t+1  )  =  TXE 


Sx ( t+1/t+1  )  =  TSE 


P  for  "predicted" 
E  for  "estimate" 


and 


where  T  following  a  matrix  denotes  the  transpose, 

I  following  a  matrix  denotes  the  inverse  of  that  matrix 
Then  the  equations  of  the  Kalman  filter  become: 


50 See  section  ( I X - E  ) 

51  The  parameter  vector  and  the  two  noise  covariance 
matrices  will  be  referred  to  as  A,  Q  and  R  respectively,  in 
all  experimental  results,  even  if  they  are  only 
one-dimensional  variables 

52  The  use  of  NU  instead  of  e  comes  from  an  earlier  notation 
for  the  innovations 

53To  avoid  the  confusion  with  the  index  K  in  FORTRAN 
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Predict  ion : 

XP  =  FUN(XE) 

SP  =  F*SE*FT  +  G*Q*GT 
SZ  =  H*SP*HT  +  R 


Update : 


NU  = 

Z  -  H*XP 

KA  = 

SP*HT  *SZ I 

TXE 

=  XP  +  KA*NU 

TSE 

=SP  -  KA*H*SP 

and  the 

sensitivity  equations: 

DXP 

=  DFUN  +  F*DXE 

DSP 

=  DF*SE*FT  +F*DSE*FT  +  F*SE* 

DFT 

+  DG*Q*GT  +  G*DQ*GT  +  G*Q* 

DGT 

DSZ 

=  DH*SP*HT  +  H*DSP*HT  +  H*SP 

*DHT  + 

DR 

DNU 

=  -H*DXP  -  DH*XP 

DKA 

=  DSP*HT*SZ I  +  SP*DHT*SZI  - 

KA*DSZ 

*SZI 

DTXE 

=  DXP  +  DKA*NU  +  KA*DNU 

DTSE 

=  DSP  -  DKA*H*SP  -  KA*DH*SP 

-  KA* 

H*DSP 

where  D( 

.)  denotes  d(.)/dk  and  is  a 

matrix 

of  app 

order  and  dimensions. 

The  negative  Likelihood,  its  gradient  and  information  matrix 
MA  are  then  computed  as: 


d  =  1/2  2Z  <  NUT*SZI*NU  +  LOG  |  SZ  |  ) 5  4  (3) 

Dd  =  2Z<(  NUT*SZ1*DNU  -  1/2  NUT*SZI*DSZ*SZI*NU 


54 |. |  denotes  the  determinant 


. 
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+  1/2  TR ( SZI *DSZ ) } 5 5  (4) 

MA  =  ~>  { (  DNUT*SZ I *DNU  -  NUT*SZI*DSZ*SZI*DNU 

-NUT*SZI*DSZ*SZI*DNU  -  1/2  TR ( SZI *DSZ*SZI*DSZ  ) }  (5) 
Finally,  the  iteration  step  is  computed  as: 

DA  =  MAI *Dd  (6) 


and : 


\+,  =  Ai  -  *DAi  (7) 

where  o(;  is  a  positive  number  adjusted  to  assure  convergence 

(generally  chosen  equal  to  1 ) . 


B.  EXPERIMENTAL  RESULTS 

Four  models  ranging  from  simple  linear  to  more  complex 
nonlinear  were  chosen  to  test  the  performance  of  the 
estimation  technique. 

B . 1  One  Dimensional  Model 

The  following  results  are  meant  as  an  illustration  of 
the  intuitive  approach  to  LIKALM  presented  in  section 
(IX-D).  However,  this  will  be  more  than  just  an  i lustration 
since  it  will  allow  us  to  explore  the  behavior  of  the 
Likelihood  function,  determine  its  minima  manually  and 
investigate  the  possibility  of  estimating  the  covariance 
matrices  Q  and  R  (which  are  simply  variances  in  this  case). 
The  Model 

As  alredy  presented  in  section  (IX-D.1)  the  model 
used  here  is: 

x ( t+ 1  )  = . 8»x ( t )  +  u( t )  (  8 ) 


55  TR  stands  for  trace 
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z( t+1 )  =  x( t  +  1  )  +  v( t+1 )  (9) 

xo  is  chosen  as  10  and  since  the  mean  value  of  x,  x,  is 
about  5,  the  standard  deviations  of  u(t)  and  v(t)  are 
chosen  as  1%  and  10%  of  x  respectively,  so  that: 

Q  =  0.0025 
R  =  0.25 

u  and  v  are  generated  by  the  GGNOF  routine56  with  seed 
numbers  1267  and  4328  respectively. 

Three  sets  of  experiments  will  be  described  here:  the 
illustration  of  the  LIKALM  algorithm  reinitialization 
procedure  at  each  point  in  time,  as  opposed  to  that  of 
OLS;  the  plotting  of  the  Likelihood  function  and  the 
manual  determination  of  its  minima  with  respect  to 
different  parameters;  and  the  actual  estimation  of  the 
model  with  a  100  data  points. 

1.  Reinitialization  in  OLS  and  in  LIKALM 

To  illustrate  the  reinitialization  procedure 
explained  in  section  (IX-C.1),  the  model  was  run  with  10 
data  points,  with  Ao  =.6  (instead  of  the  true  value  .8). 

■  The  Least  Squares  algorithm  converges  (with 
dS/dA  =  0,  where  S  is  the  criterion  to  be  minimized)  in 
two  iterations  yielding: 

A  =.74. 

The  two  trajectories,  wi th  A  =  .6  and  A  =  .74  are  shown 
in  Figures  X-1  and  X-2. 

As  the  legend  indicates,  at  each  point  in  time,  the 


5 6  See  p.  81 
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*  :  real  state 
Z  :  measured  data 
E  :  estimated  state 


FIGURE  X -  1  :  One -d i mens i ona 1  model:  simulation  of  trajectory 
using  0L5  (first  iteration). 
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2.  3.  4.  5.  6.  7.  8  910. 

*  :  real  state 
Z  :  measured  data 
E  est imated  state 


FIGURE  X-2:  One-dimensional  model:  simulation  of  trajectory 
using  0L5  (last  (second)  iteration). 
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simulation  is  reinitialized  on  the  measured  data  point, 
thus  Keeping  the  trajectory  far  from  the  noiseless  path. 
The  values  of  x  for  the  noiseless  path,  the  noisy 
measurements  z,  and  the  estimated  state  x  are  also 
indicated  in  Table  X-1. 

■  The  LIKALM  program  (run  here  with  all  unknowns 
specified  at  their  true  ideal  values),  starting  with 
Ao  =  .6  converges57  in  4  iterations  yielding: 

A  =  .79  ±  0.01 

with  L  =  0.97  and  cfL/dA  =  3.27 
The  first  and  last  trajectories  are  shown  in  Figures  X-3 
and  X-4,  and  the  values  of  x ,  z  and  x  are  shown  in 
Table  X-2. 

Here  the  simulation  is  reinitialized  at  the  Kalman 
estimate  of  the  state.  For  the  first  iteration,  the 
trajectory  is  much  worse  than  for  OLS  since  the  Kalman 
Filter  has  wrong  parameter  values,  but  by  the  fourth 
iteration,  the  simulated  trajectory  is  now  remarkably 
closer  to  the  noiseless  path. 

2.  Plots  of  the  Likelihood  Function 

It  is  easy,  in  this  one  dimensional  case,  to  plot 
the  behavior  of  the  Likelihood  function  with  respect  to 
A,  Q  or  R,  to  obtain  an  idea  of  its  general  shape,  and 
find  its  minima  manually  in  order  to  check  whether  the 
algorithm  converges  to  the  absolute  minimum.  Having 
obtained  an  intuitive  grasp  of  the  behavior  of  the 

5 7 The  convergence  criterion,  e  taken  as  the  change  in  the 
likelihood  function,  was  set  to  0.001) 
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ITERATION  1 


X 

TX 

Z 

XE 


A  =  0 . 6000 


t 

X 

TX 

Z 

XE 

1 

10.00 

7 . 88 

6 . 97 

6 . 25 

2 

7 . 88 

6 . 33 

6 .06 

4.18 

3 

6 . 33 

5.04 

4 . 72 

3 . 64 

4 

5.04 

3 . 97 

3 . 08 

2 . 83 

5 

3 . 97 

3 .07 

3 . 52 

1  .  85 

6 

3 . 07 

2 . 45 

2.65 

2.11 

7 

2 . 45 

1 . 97 

2.11 

1  .  59 

8 

1  .  97 

1  .  52 

0.95 

1  .  27 

9 

1  .  52 

1  .  24 

2 . 52 

0 . 57 

10 

1  .  24 

0.96 

0.76 

1.51 

ITERATION 

A  = 

0. 

7405 

t 

X 

TX 

Z 

XE 

1 

10.00 

7 . 88 

6 . 97 

7.71 

2 

7 . 88 

6 . 33 

6 . 06 

5  .  16 

3 

6 . 33 

5 .04 

4 . 72 

4 . 49 

4 

5 . 04 

3 . 97 

3 . 08 

3 . 50 

5 

3  .'97 

3.07 

3 . 52 

2 . 28 

6 

3.07 

2 .45 

2 . 65 

2.61 

7 

2 . 45 

1  .  97 

2.11 

1 . 96 

8 

1 . 97 

1  .  52 

0.95 

1  .  56 

9 

1  .  52 

1  .  24 

2 . 52 

0.70 

10 

1  .  24 

0.96 

0 . 76 

1  .  87 

state  at  time  t 
state  at  time  t+1 
measured  data  at  t+1 


S 

G 

M 

0. 52 

-15.02 

216.96 

4 . 04 

-41.18 

3  14.  13 

5 . 22 

-54 . 33 

387 . 54 

5 . 28 

-56 . 7  1 

432  .  12 

8 .08 

-67.02 

451.14 

8 . 36 

-70.78 

475 . 96 

8 . 63 

-73 . 54 

489 . 97 

8 . 74 

-72  .  19 

498 . 87 

12.54 

-75 . 89 

500.66 

13.10 

-72  .  13 

513.35 

2 

S 

G 

M 

0.55 

15.47 

216.96 

1  .  36 

2 . 96 

314.13 

1.41 

0.12 

387 . 54 

1  .  58 

4.01 

432 . 12 

3.11 

-3 . 63 

451.  14 

3.12 

-3.91 

475 . 96 

3.14 

-4 . 70 

489 . 97 

3 . 52 

-2  .  10 

498  87 

6 . 82 

-5 . 54 

500 . 66 

8.03 

0.00 

513.35 

S  :  estimation  criterion 
G  :  gradient  of  5 
M  :  second  derivative  of 


estimate  of  state  at  time  t+1 


S 


TABLE  X - 1 :  Values  of  variables  for  figures  X-1  and  X-2. 
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0.  1 .  2  . 


3.  4.  5.  6.  7.  8  9.10. 


*  :  real  state 
Z  :  measured  data 
E  :  estimated  state 


FIGURE  X  -  3 :  One-dimensional  model:  simulation  of  trajectory 
using  LIKALM  (first  iteration). 
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*  :  real  state 
Z  :  measured  data 
E  :  estimated  state 


FIGURE  X  - 4  :  One-dimensional  model:  simulation  of  trajectory 
using  LIKALM  (last  (fourth)  iteration). 


ITERATION  1 


A  =  0.6000 


t 

X 

TX 

Z 

ZP 

E 

XE 

L 

G 

M 

1 

10.00 

7 . 88 

6 . 97 

6.00 

0.97 

6 .01 

1.18 

-5 . 40 

11.76 

2 

7 . 88 

6 . 33 

6 . 06 

3.61 

2 . 45 

3 . 64 

12.36 

-88 . 2  1 

53 . 70 

3 

6 . 33 

5.04 

4 . 72 

2.18 

2 . 54 

2 . 22 

24 . 37 

-250. 1 1 

405 . 70 

4 

5.04 

3 . 97 

3.08 

1  .  33 

1  .  75 

1  .  36 

29 . 73 

-380.56 

1215.73 

5 

3 . 97 

3.07 

3 . 52 

0.82 

2.71 

0.86 

43 . 48 

-619 . 32 

2072 . 87 

6 

3 . 07 

2 .45 

2 . 65 

0.51 

2.13 

0.55 

5  1.76 

-823 . 50 

3407 .00 

7 

2 . 45 

1 . 97 

2.11 

0.33 

1  .  78 

0.35 

57 . 32 

-987 . 27 

4764 . 79 

8 

1  .  97 

1  .  52 

0 . 95 

0.21 

0.73 

0.22 

57 . 70 

-1040.47 

5961 .08 

9 

1  .  52 

1  .  24 

2 . 52 

0.13 

2 . 38 

0.17 

68.21 

-  1225 . 46 

6634 . 08 

10 

1  .  24 

0.96 

0 . 76 

0.  10 

0.66 

0.11 

68 . 39 

-  1269 . 65 

768  1  . 29 

ITERATION  4 

A  =  0.7924 


t 

X 

TX 

Z 

ZP 

E 

XE 

L 

G 

M 

1 

10.00 

7  88 

6 . 97 

7 . 86 

-0.89 

7 . 86 

0.90 

-4 . 29 

11.76 

2 

7 . 88 

6 . 33 

6.06 

6.18 

-0.12 

6.18 

0.24 

-2.61 

7  1.70 

3 

6 . 33 

5 . 04 

4 . 72 

4 . 86 

-0.14 

4 . 85 

-0.4  1 

-0.49 

145 . 13 

4 

5 .04 

3 . 97 

3.08 

3 . 82 

-0 . 73 

3 . 80 

-0.04 

-13.54 

232  .  16 

5 

3 . 97 

3.07 

3 . 52 

2 . 99 

0.53 

3.00 

-0.  16 

0.60 

484 . 83 

6 

3.07 

2 . 45 

2 . 65 

2 . 36 

0.29 

2 . 37 

-0.69 

5 . 68 

559 . 89 

7 

2 . 45 

1  .  97 

2.11 

1  .  86 

0.25 

1  .  87 

-1.25 

8 . 57 

626 . 66 

8 

1  .  97 

1  .  52 

0.95 

1  .  47 

-0.52 

1  .  46 

-1.40 

12.92 

713.06 

9 

1  .  52 

1  .  24 

2 . 52 

1.15 

1  .  37 

1.18 

1 .60 

-6 . 29 

795 . 58 

10 

1  .  24 

0.96 

0. 76 

0.93 

-0.  16 

0.92 

0.97 

3 . 27 

1137.57 

X 

state  at 

time  t 

L 

Likelihood  function 

TX 

state  at 

time  t  + 1 

G 

gradient  of  L 

Z 

measured 

data  at  t+1 

M 

second  derivative  of  S 

ZP 

XE 

pred i cted 
estimate 

mesurement 

of  state  at  time  t+1 

E 

i nnovat i on 

TABLE  X - 2 :  Values  of  variables  for  figures  X-3  and  X-4. 
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algorithm  under  these  circumstances,  it  becomes  easier 
to  understand  its  behavior  when  the  multidimensionality 
makes  it  difficult  to  visualize. 

Table  X-3  presents  the  minima  of  the  Likelihood  function 
and  the  values  of  its  gradient  and  second  derivatives 
under  various  conditions. 

In  the  case  of  L(A),  the  initial  value  of  the  Likelihood 
function,  L  =  68.39  with  a  gradient  of  -479.8  is 
decreased  to  L  =  0.97  with  a  gradient  of  4.8.  Both 
versions  of  the  Information  matrix58  Ml  and  M2  have 
similar  positive  values  indicating  that  we  do  indeed 
have  a  minimum.  In  every  case,  the  gradient  sign  change 
from  -  to  +  corresponds  to  the  minimum  of  the  Likelihood 
function . 

It  is  interesting  to  note,  however,  that  in  the  case  of 
L(Q)  and  L(R),  Ml  behaves  inconsistently  since  it  is 
negative  at  the  minimum!  This  confirms  Mehra' s 
statement59  that  this  matrix  can  be  non -positive 
definite  and  that  it  is  preferable  to  use  the  second 
form,  M2,  which  remains  always  positive. 

Table  X-3  also  shows  that  the  minimum  L(Q)  is  obtained 
for  Q  negative!  This  suggests  that  we  probably  are  in 
presence  of  an  identification  problem  since  the 
Likelihood  function  does  not  seem  to  be  able  to  display 
a  minimum  for  the  correct  value  of  Q.  The  minimum  for  R 


58For  a  description  of  the  two  versions,  see  section 
(IX-F.1) 

5 9 See  section  (IX-F.1) 
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TABLE  X - 3 

One-dimensional  model 
Manual  Minima  of  the  Likelihood  function 


Cond i t i ons 

Var i abl e 

Mini  mum 

L i kel i hood 

I  nforr 

mai 

Ml 

na  t  i  on 
:r  i  x 

M2 

« 

55  =  0.01 

0  =  0  0025 

R  =  0.25 

A 

0.78 

1 .05 

867  1 

8657 

SS  =  0.001 

A  =  0 . 8 

R  =  0.25 

0 

0.04  5 

1.71 

-70 

1  14 

SS  =0.0001 

A  =  0.786 

R  =  0.25 

0 

-0 . 084 

0.95 

2346 

SS  =  0.05 

A  =  0. 8 
0=0.0025 

R 

0.40 

0.97 

-30 

15 

SS  =  0.05 

A  =  0.786 
0=0.0025 

R 

0.35 

0. 38 

19 

SS  =  0.05 

A  =  0.786 
0=0.0045 

R 

0 . 35 

0.77 

15 

Cond i t i ons : 

10  data  points 

0=1%.  R  = 1 0% 

SS  is  the  step  size  for  the  manual  search 
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is  acceptable. 

The  plots  of  L ( A ) ,  L(Q)  and  L(R)  in  Figure  X-5  show 
that  the  minimum  for  L ( A )  is  relatively  well  defined  and 
attained  via  a  steep  slope.  (The  gradient  varies  from 
-383  for  A  =  0.71  to  +2446  for  A  =  0.9).  This  is  also 
true  for  L ( R )  (with  A  =  Amin)  where,  dl/dR  varies  from 
-618  for  R  =  0.05  to  0  for  R  =  0.35,  although  the  ascent 
after  the  minimum  is  much  slower  (the  slope  is  only  +3 
for  R  =  1  )  . 

However,  the  situation  is  very  different  for  Q  for  which 
L(Q)  (with  A  =  Amin)  is  extremely  flat.  L  varies  from 
0.95  to  1.08  for  Q  varying  from  -0.01  to  0.03,  with  a 
coresponding  variation  for  the  gradient  from  -1.7  to 
4.56,  thus  making  it  very  difficult  for  the  algorithm  to 
locate  the  minimum. 

Running  the  simulation  with  a  hundred  data  points  does 
not  change  this  situation  much. 

If  we  now  increase  the  equation  noise  level  from  1% 
to  10%  (R  remaining  at  its  previous  10%  value),  the 
Likelihood  function  L(Q)  displays  a  minimum  for  Q 
between  0.20  and  0.22,  as  indicated  in  Table  X-4. 

The  behavior  of  the  Likelihood  function  with  respect  to 
Q  and  R  is  now  very  similar.  They  both  attain  their 
minimum  in  very  close  proximity  to  the  true  values,  with 
similar  Likelihood  values  at  the  minimum  and  a  steep 
descent,  as  illustrated  by  Figure  X-6. 

The  minimum  for  A  is  now  much  worse  and  the  descent  much 
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FIGURE  X - 5  Plots  of  the  Likelihood  function  near  its  minima  with  respect  to  A,  0  and 
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TABLE  X  -  4 

One-dimensional  model 

Manual  Minima  of  the  Likelihood  function 
10%  equation  noise 


Cond i t i ons 

Variable 

Mini  mum 

L  i  ke 1 i hood 

M 

SS  =  0.05 

A 

0.68 

17.96 

707 

0  =  0  25 

R  =  0.25 

SS  =  0.05 

0 

0.20 

23 . 37 

439 

A  =  0 . 8 

R  =  0.25 

SS  =  0.05 

0 

0.20 

17.68 

429 . 3 

A  =  0.68 

R  =  0.25 

SS  =  0.05 

R 

0.20 

17.96 

463 . 5 

A  =  0.68 

0  =  0.25 

Cond i t i ons  : 

100  data  points 

0=10%.  R  = 1 0% 


L  (  A  ) 
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FIGURE  X-6  Plots  of  the  Likelihood  function  near  its  minima  with  respect  to  A,  0  and 
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slower  than  previously.  The  fact  that  the  algorithm  is 
capable  of  determining  accurately  only  two  out  of  the 
three  parameters  (the  third  one  appearing  with  a  large 
standard  deviation)  depending  on  the  experimental 
conditions  is.  indicative  of  an  identification  problem. 
This  problem  will  be  delt  with  in  the  following 
sections . 

3.  Estimation  of  A,  0  and  R. 

■  Description  of  the  problem 
One  of  the  major  problems  encountered  in  the  use  of  the 
Filtering  Form  of  the  Maximum  Likelihood  algorithm  is 
the  need  to  either  specify  or  estimate  the  noise 
covariance  matrices  Q  and  R,  as  well  as  the  initial 
state  (XEo)  and  its  covariance  matrix  (SEo)  that  are 
needed  for  the  Kalman  filter.  This  is  also  a  problem 
generally  encountered  in  the  use  of  Kalman  Filter  when 
the  statistics  of  the  process  and  measurement  noise 
sequences  are  not  known.  Generally,  some  knowledge  of 
the  noise  statistics  is  assumed  or  available  and  the 
problem  of  estimating  them  it  not  even  posed.  In  other 
engineering  applications,  such  as  aerospace 
applications,  however,  it  is  a  rather  difficult  task  to 
determine  the  statistics  of  many  unknown  sources  of 
noise,  and  errors  are  then  inevitable  in  the  "a  priori" 
values  assigned  to  the  covariance  matrices.  T.  Nishimura 
[114]  examines  the  effects  of  errors  in  these  "a  priori" 
statistics  on  the  performance  of  the  resulting 
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suboptimal  filters.  Although  his  study  is  limited  to 
linear  systems,  it  still  shows  that  even  in  this  case, 
the  Kalman  Filter  behaves  very  poorly  when  the 
measurement  noise  covariance  matrix  R  is  either  very 
large  or  very  small  compared  to  the  true  value. 

Mehra  [84]  suggests  using  for  R  the  sample 
covariance  matrix  obtained  from  the  output  error 
method60.  There  is  however  no  way  of  obtaining  an 
estimate  of  Q,  the  process  noise  covariance  matrix, 
unless  some  value  is  available  from  previous 
experiences,  or  the  ratio  of  Q  to  R  is  Known. 

Attempting  to  estimate  both  0  and  R  leads  to 
ident i f i abi 1 i ty  problems,  as  illustrated  below. 

■  The  ident i f i abi 1 i ty  problem 
Referring  back  to  section  (IX-D),  suppose  that  we  now 
wish  to  estimate  not  only  b,  but  also  q  and  r.  We  can 
show  with  this  simple  model  that  there  will  be  an 
i dent i f i abi 1 i ty  problem. 

Assuming  that  the  Kalman  Filter  has  attained  a 
steady  state,  consider  that  the  asymptotic  algorithm 
ignores  the  gradient  terms  dsz/db,  dsz/cfq  and  dsz/dr  in 
the  Likelihood  maximization  and  includes  only  the  terms 


[z( t )  -  z( t ) ] 


grad[z( t ,b,q , r ) ] 


(10 


L-  o 


60The  output  error  method  minimizes  the  square  of  the  error 
between  the  actual  system  output  and  the  output  of  a  model 
used  to  represent  the  actual  system.  The  method  assumes 
measurement  noise  but  no  process  noise 
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which  is  equivalent  to  minimization  of  the  sample 
covariance  of  the  innovations  (assuming  sz(t)  constant): 


t  :fl 


z( t ,b,q , r ) ] 2 


(  11  ) 


But,  as  T  >  oo  ,  this  is  asymptotic  to: 

E {e}  =  E { [ x ( t )  -  x ( t )  +  v(t) ] 2}  =  s2  +  r 

^  s 2 ( true )  +  r  ( true )  (12) 

for  a  Kalman  Filter  with  "true"  parameters. 

The  algorithm  will  thus  want  to  find  values  of  b,  q  and 
r  which  minimize: 

J  =  sj  +  r  (13) 

subject  to  the  Kalman  Filter  variance  equation  in  its 
steady-state  limit  form: 

S2»r 

s 2  —  b 2  • -  +  q  (14) 

*  H  ♦  r 


But  this  problem  is  under -speci fied  since,  assuming  a 
minimum  value  J(min)  exists,  with  cor respondi ng  values 
of  s2(min)  and  r(min),  any  values  of  b  and  q  satisfying 
the  equation: 


s2 (mi n ) »r (min ) 

s2(min)  =  b2  •  -  +q  (15) 

x  s2(min)+r(min) 
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are  solutions,  hence  are  non-unique. 

One  can  avoid  this  problem  by  fixing  one  parameter, 
say  r,  then  minimizing  sj  as  a  function  of  b  and  q,  and 
finally  solving  for  a  new  r  as  found  from  the  sample 
covariance  matrix: 

r  =  J(b,q,r)  -  s^{b,q,r)  ( 16) 

The  iteration  would  be  then  repeated  with  r  replacing  r, 
and  so  on . . . 

This,  however,  is  applicable  only  in  the  case  of 
linear  systems,  when  the  Kalman  Filter  can  be  assumed  to 
have  attained  a  steady-state,  and  is  in  essence  the 
simplification  proposed  by  Eaton  [99]  (see  section 
( I X - F  )  )  .  There  is  no  theory  as  to  what  applies  to  the 
nonlinear  case. 

In  this  case,  the  general  trend  of  practice  seems 
to  be  to  manipulate  the  information  matrix  by  numerical 
methods  when  facing  problems,  as  explained  by  Mehra 
[84],  [112]. 

Stating  that  innacurate  noise  statistics  (usually 
chosen  in  an  ad-hoc  way  or  from  "a  priori"  Knowledge) 
are  the  cause  of  bias  and  ident i f i abi 1 i ty  problems  in 
the  use  of  the  Extended  Kalman  Filter  as  a  nonlinear 
state  and  parameter  estimator,  Ljung  [115]  proposes  a 
different  form  of  parameterization  which  could  also  be 
applied  to  the  case  of  the  LIKALM  algorithm. 

"  In  the  model,  there  are  certain  assumptions 
associated  with  the  noise  covariance  matrices, 
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whether  parameter ized  or  not.  It  should  be  noted 
that  the  effect  of  these  assumptions  is  in  fact 
only  to  provide  a  Kalman  Filter  gain.  It  is  this 
gain  that  has  the  algorithmic  importance  and  the 
noise  assumptions  are  only  vehicles  to  arrive  at 
it.  Therefore,  it  should  in  most  cases  be  a  good 
idea  to  parameterize  the  steady-state  Kalman 
gain  rather  than  the  covariance  matrices.  This 
will  normally  involve  fewer  parameters  (which 
means  that  usually  the  individual  noise 
covariance  are  not  identifiable  -  only  the 
Kalman  gain  and  the  innovations  covariance 
matrix  are ) . "  [115] 

■  An  "ad-hoc"  solution 

An  ad-hoc  way  to  circumvene  the  problem  in  our  case  is 
suggested  here. 

Referring  back  to  section  (II-C.4),  in  the  case  of 
a  regression  with  a  single  variable,  we  have: 

e  =  y  -  b»x  (17) 

where  e,  y  and  x  are  (Txl)  and  T  is  the  number  of 
observations 

An  estimate  of  the  residual  sequence  is  obtained  by 
replacing  b  by  its  estimate  b: 

e  =  y  -  b»x .  (18) 

An  estimate  of  the  variance  of  the  noise  process  is  then 
given  by: 
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si  =  (  1/T)»e'e.  (19) 

e  —  — 

I f  we  now  have  measurement  noise  added,  adopting  the 
same  notations  as  in  section  (II-C.4),  we  have: 

w  =  e  +  v  -  bu  (20) 

where  w  =  _Y  -  b«X 

and  we  have  the  following  relationship  between  the 
various  noise  variances: 

si=s£+si+b2»s2  (21) 

W  C  v  u. 

which  can  be  estimated  as: 

S  2  —  S  2  +  S2  +  t)2.S2  (22) 

W  C.  V  U_ 

In  the  multiple  regression  case,  the  situation  becomes 

more  complex. 

If  we  now  have: 

e  =  y  -  Xb  (23) 

where  e  and  y  are  ( T x 1 ) ,  X  is  (TxN),  b  is  (Nxl)  and 
N  is  the  number  of  regressors. 

If  x(1)....x(N)  are  measured  with  noise  such  that 
z ( 1 )  =  x( 1 )  +  u( 1 )  ....  z(N)  =  x(N)  +  u(N),  and 
Y  =  y  +  v,  then: 

Y  =  Zb  +  w  (24) 

where : 

w  =  e  +  v  -  Ub  (25) 

and : 

Z  =  {u( 1 ) . . . .u(N) } 

U  =  {u( 1 ) . . . . u ( N ) } 

We  then  obtain  an  expression  for  the  variance  of  the 
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noise  process  as: 

sw  =  si  +  SJ  +  bU#  Ub  (26) 

assuming  again  that  all  noise  sequences  are  not 
cor re  1 ated . 

If  the  system  is  in  system  dynamics  formulation  so 
that  each  equation  can  be  written  in  the  form: 

Y  =  YL' »b  +  e  (27  ) 

where  YL  is  (TxN),  matrix  of  lagged  values  of  all 
endogenous  variables,  and  if  we  measure  all  the 
variables  with  noise,  equation  (26)  becomes: 

sw  =  sl  +  su  +  b'  U'  Ub  (28) 

But  U' U  now  becomes  the  measurement  noise  covariance 
matrix  R  (assumed  diagonal). 

Thus,  if  we  Know  the  ratio  between  equation  and 
measurement  noise  for  each  equation,  the  above  equation 
can  be  solved  for  rii  and  qi i . 

This  again  demonstrates  that  our  problem  is 
under -speci f i ed  and  therefore  nonuniquely  solvable. 
sw2  can  be  computed  from  the  sum  of  squared  residuals 
SSR  yielded  by  the  TSP  Least  Squares  routine  [89]. 

A  diagonal  matrix  S  can  thus  be  constructed  from  the  SSR 
values  for  each  individual  equation  and  we  have  the 
re  1  at i onshi p : 

S  =  Q  +  R  +  M  (29) 

where  M  is  a  matrix  involving  terms  of  the  type  b' U' Ub 
or  b' Rb. 

The  magnitude  of  the  terms  in  M  depends  on  the 
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estimates  of  the  parameter  values  as  well  as  on  the 
various  measurement  noise  variances.  The  terms  in  M  are 
however  always  positive  so  that  S  i s  an  upper  bound  on 
Q  +  R.  In  the  absence  of  any  information  at  all  on  the 
order  of  magnitude  of  the  noise  variances  (which  is  a 
purely  theoretical  situation  since  in  most  practical 
cases  the  modeler  would  have  at  least  some  idea  of  the 
upper  bounds  of  such  errors),  it  is  suggested  here  to 
use  the  matrix  S  to  obtain  values  for  R  and  Q  based  on 
the  approximation  S  =  Q  +  R  (very  rough  in  the  case  of 
large  parameter  values). 

One  could  then  advance  by  trial  and  error  in 
assessing  an  approximate  value  for  the  ratio,  guided  by 
the  value  of  the  Likelihood  function. 

Again,  it  would  help  here  to  have  an  idea  of  the  ratio 
Q/ R  .  .  . 

■  Experimental  results. 

Attempts  at  estimating  various  combinations  of  A,  Q  and 
R  in  the  one-dimensional  model  can  be  summarized  in  the 
following  qualitative  way. 

•  If  Q  is  estimated  alone  or  along  with  A,  the  final 
value  of  Q  is  negative  with  a  large  standard  deviation, 
or  zero  if  Q  is  constrained  to  remain  positive.  The 
estimate  for  A  is  good. 

•  If  R  is  estimated  alone  or  with  A,  the  final  values 
for  both  parameters  are  acceptable. 

•  If  Q  and  R  are  estimated  alone,  Q  is  still  negative 
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and  R  is  acceptable. 

•  If  A,  Q  and  R  are  estimated,  the  algorithm  does  not 
converge  unless  Q  is  constrained  to  remain  positive. 

Table  X-5  presents  a  comparison  of  the  estimates  of 
A,  Q  and  R  with  1%  and  10%  equation  noise  (the 
measurement  noise  is  Kept  at  10%  in  both  cases). 

In  the  Q=1%  case,  with  Q  constrained  to  remain  positive, 
A  and  R  are  estimated  very  accurately  and  are  within  one 
standard  deviation  of  their  true  values. 

With  Q= 1 0% ,  the  estimate  of  Q  is  now  much  better,  but 
the  value  and  standard  deviation  of  A  have  worsened,  as 
expected  from  the  results  of  the  plots  of  the  Likelihood 
f unct i on . 

In  the  light  of  these  results  with  a  simple 
one-dimensional  model,  we  can  expect  difficulties  in  the 
estimation  of  the  of  the  noise  statistics  for  higher 
order  systems.  This  situation  is  further  examined  with 
our  next  model,  a  two-dimensional  oscillator. 

B.2  Two  Dimensional  Oscillator 
1 .  The  model 

This  is  a  very  simple  model  destined  to  make  sure  that 
the  algorithm  behaves  properly  with  a  linear  model,  when 
there  is  interaction  between  equations,  before  attempting  to 
try  it  with  a  more  complex  nonlinear  model.  Again,  the 
simplicity  of  this  model  and  its  small  size  make  it  possible 
to  examine  thoroughly  the  behavior  of  the  algorithm  under 
different  conditions. 


Table  X-5 


One-dimensional  model 

Estimates  of  A,  0  and  R  for 
1%  and  10%  equation  noise 


No  i  se 

Cond i t i ons 

A 

0 

R 

L i ke 1 i hood 

iter. 

0  =0.0025 
R  =  0.25 

A  =  .  8 

Ao  =  .  6 

Oo  ideal 

Ro  =  0.5 
SEo  =0.5 
XEo  =  Z1 

0.8218 
(0.0123 ) 

0 

0.2600 
(0.0382 ) 

-23 . 39 

5 

0  =  0.25 

R  =  0.25 

A  =  .8 

Ao  =  .6 

Oo  =  0.1 

Ro  =  0.1 

SEo  ideal 
XEo  ideal 

0.6923 

(0.0342) 

0. 1709 
(0.0713) 

0.2894 

(0.0783) 

17.53 

6 

Cond i t 1 ons : 


100  data  points 


*'"■  V 
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The  model  is: 


TX  1 

=  XI 

+  0. 

.  0 1 *X2 

+  U1 

(30) 

TX2 

=  X2 

-  0, 

.  0 1 *X 1 

+  U2 

(31  ) 

and : 


Y  1  =  T  X  1 

+  VI 

(32) 

Y  2  =  TX2 

+  V2 

(33) 

Where  U  and  V  are  random  seqences  generated  by  the  IMSL 
subroutine  GGNOF  using  different  seeds  and  with  standard 
deviations  varying  with  experimental  conditions. 

The  unknown  parametrs  are  A1=  0.01,  A2=  -0.01  and  the 
variances  of  the  noise  sequences  are  equal  to  1%  and  10%  of 

the  mean  value  of  XI  or  X2:  x  =  1.5,  so  that: 

su  -  0.015  and  Q  =  s^  =  0.000225 

sv  =  0.15  and  R  =  sj  =  0.0225 

The  initial  values  for  XI  and  X2  are  chosen  as  2.  and  0. 
respectively.  The  trajectory  of  the  model  with  and  without 
the  noise  inputs  is  illustrated  in  Figure  X-7. 

2 .  OLS  est imates 

The  OLS  estimates  of  this  simple  model  with  1%  equation 
noise  and  10%  measurement  noise  are  presented  in  Table  X-6, 
along  with  the  seed  numbers  for  GGNOF. 

The  estimate  of  A1  is  very  bad,  being  in  error  by  a  factor 
of  about  3  and  is  also  insignificant  (probability  that 
A 1  =  0  is  85%  !)  The  fit  is  very  low,  thus  indicating  the 
drastic  effect  of  the  noise  on  this  simple  linear  model. 


■ 

5*f  I 
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FIGURE  X-7 :  Two-dimensional  model : 


TABLE  X-6 


Two-dimensional  Oscillator 
OLS  Estimates 


Cond i t i ons 

> 

II 

o 

o 

O 

o 

II 

CM 

< 

Nj) 

II 

O 

0.00367 

0.0103 

R  =  1 0% 

(0.019 ) 

(0.0108 ) 

R '  =0 . 0008 

R! =0.01 15 

0=10% 

-0.0898 

-0.0054 

R  =  1 0% 

(0.058 ) 

(0.011) 

R*  =0.0256 

R*  =0.0073 

O 

ii 

0.0100 

-0.01 

no  measurement 

(0.00) 

(0.0) 

no  i  se 

R  ?  =  1 .0 

R  ?  =  1 

The  standard  deviations  are  in  Brackets 
Cond i t i ons : 

100  data  points 

The  seed  numbers  for  GGNOF  are: 

U1  =  14785  U2  =  46583 
VI  =  63976  V2  =  74372 
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The  estimate  of  A2  is  much  better  but  it  is  also  very 
insigni f icant . 6 1 

Running  the  program  with  10%  equation  noise  results  in  a 
drastic  worsening  of  the  estimates  (A1  now  has  the  wrong 
sign  )  as  also  shown  in  Table  X-6.  The  estimates  of  A1  and 
A2  with  only  equation  noise  are  shown  for  comparison. 

3.  LIKALM  estimates 

A  first  run  was  attempted  using  the  LIKALM  program  to 
estimate  the  two  unknown  parameters  A1  and  A2 .  In  this  first 
run,  the  actual  values  of  the  parameters  were  used  as  a 
starting  point  for  the  iteration.  The  covariance  matrices  of 
the  equation  and  measurement  noises  Q  and  R  were  assumed 
known,  diagonal,  with  diagonal  elements: 

for  Q:  q„  =  qzl  =  0.000225 
for  R :  r„  =  r2i  =  0.0225 

The  program  also  needs  starting  values  for  the  Kalman 
Filter:  the  initial  value  of  the  state  vector  XE  and  its 
covariance  matrix  SE . 

■  For  this  first  run,  XEo  was  taken  as  the  actual  starting 
value  of  the  model  and  therefore  since  XEo  is  known  with 
certainty,  SEo  =  0. 

Under  these  perfectly  ideal  conditions,  the  program 
converges  to: 

A 1  =  0.00794  ±  0.0018 
A2  =  -0.00910  ±  0.00017 


61  These  estimates  are  very  bad,  but  Table  X-12  presents 
estimates  with  different  noise  sequences  that  are  better  but 
still  have  the  same  standard  deviations 


. 
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with  a  value  of  the  Negative  Likelihood  function  of: 

L  =  -298.4. 

An  approximation  to  the  standard  deviations  is  given  by  the 
square  roots  of  the  diagonal  elements  of  the  inverse  of  the 
Information  matrix.  Two  iterations  were  necessary  to  obtain 
covergence.  (Convergene  is  decided  manually  and  was  obtained 
when  the  Likelihood  function  showed  no  change  and  when  the 
elements  of  the  incremental  step  vector  were  smaller  than  a 
specified  value  chosen  as  1%  of  each  parameter  value  ).  The 
cost  of  running  the  program  for  this  model  with  100  data 
points  is  $0.40  per  iteration. 

These  values  are  much  better  than  the  OLS  values  indicated 
in  Table  X-5,  and  although  obtained  under  very  ideal 
conditions,  demonstrate  the  capacity  of  the  LIKALM 
estimation  to  improve  on  the  traditional  OLS  estimation  for 
this  two-dimensional  oscillator.  The  standard  deviations 
also  indicate  acceptable  parameter  estimates. 

Having  obtained  convergence  under  these  ideal 
conditions,  the  performance  of  the  method  (robustness) 
should  now  be  tested  by  relaxing  them  one  by  one  in  order  to 
investigate  their  effect  on  the  estimates.  More 
speci f ical ly : 

1.  With  ideal  (actual)  values  for  Q,  R,  XEo  and  SEO,  try 
different  starting  point  for  the  parameter  vector  and 
draw  a  zone  of  convergence 

2.  With  ideal  Q  and  R,  OLS  starting  values  for  A,  asses  the 
effect  of  using  the  first  available  data  point  for  XEo 
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and  SEO  =  R 

3.  Estimate  Q  and  R  along  with  A. 

4.  With  XEo,  SEO  and  A  as  previously  defined,  asses  the 
effect  of  wrong  values  for  Q  and  R. 

This  last  approach  has  been  used  here  because,  as 
explained  in  section  (X-B.1),  the  estimation  of  Q  and  R  has 
proven  problematic. 

Tables  X-7  to  X - 1 1  present  the  results  of  these  runs 
using  the  same  noise  sequence  as  for  the  OLS  estimates.  The 
effect  of  using  different  noise  sequences  is  illustrated  in 
Table  X-12.  The  results  reported  in  this  last  table  are  for 
the  worse  conditions  that  is: 

Ao  =  OLS 

Q  =  R  =  Sample  covariance  matrix,  S 
XEo  =  Z1 
SEo  =  R 

4.  Analysis  of  results 

■  Table  X-7  shows  that  for  this  particular  linear 
model,  the  program  converges  to  the  same  final  values  for  a 
very  wide  range  of  starting  points  in  only  3  to  4  iterations 
(all  other  values  being  Kept  ideal  ).  The  worst  case  (which 
still  allows  for  convergence)  is  when  A1  and  A2  take  ten 
times  their  actual  values,  with  the  wrong  sign!  As  can  be 
seen  from  this  table,  there  was  no  convergence  for  the  last 
two  values  tried  ;  however,  the  conditions  in  these  cases 
were  rather  drastic  (a  hundred  times  A1  and  A2,  with  the 
wrong  sign  and  A 1  =  5 ,  A2  =  0  ) . 
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TABLE  X-7 

Two-dimensional  Oscillator 
LIKALM  Estimates 

Convergence  zone  under  ideal  conditions 


A  1 

A2 

A  1 

A  2 

Like! i hood 

0.01 

-0.01 

0.00794 
(0.00187 ) 

-0.00910 

(0.00090) 

-298 . 4 

0.00367 

-0.0103 

0.00794 
(0.00187 ) 

-0.00910 
( 0 . 00090 ) 

-298 . 4 

0.05 

0.0 

0.00794 

(0.00187) 

-0.00910 
( 0 . 00090 ) 

-298 . 4 

0.05 

0.05 

0.00794 

(0.00187) 

-0.00910 
( 0 . 00090 ) 

-298 . 4 

-0.  1 

0.  1 

0.00794 
(0.00187 ) 

-0.00910 

(0.00090) 

-298 . 4 

-1.0 

1  .0 

d i vergence 

5 . 0 

0.0 

d i vergence 

Cond i t i ons : 

All  i dea 1 

100  data  points 
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The  conclusion  for  this  set  of  runs  is  that  the 
algorithm  is  not  very  sensitive  to  the  choice  of  a  starting 
point.  A  bad  OLS  starting  value  is  good  enough  to  obtain 
convergence  in  this  case.  This  is  due  to  the  fact  that  the 
model  itself  is  very  stable  (linear  oscillator)  and  not  very 
sensitive  to  parameter  variations  as  can  be  seen  from  Figure 
X-8  which  illustrates  the  behavior  of  the  model  without 
noise  inputs  in  the  original  model  and  if  the  parameters  are 
replaced  by  their  OLS  estimates. 

■  Table  X-8  demonstrates  the  effect  of  using  the  first 
data  point  instead  of  the  unavailable  initial  state  XEo,  and 
to  set  SEo  (the  covariance  matrix  of  the  random  variable 
XEo)  equal  to  R,  the  covariance  matrix  of  the  noise. 

This  choice  makes  sense  since  the  measurements  (z(  1  )  in 
this  case)  consist  of  the  true  value  of  the  state  vector 
x(  1  )  ,  plus  noise.  The  error  covariance  SE ( 1 / 1 )  is  then  given 
directly  as: 

SE ( 1/1 )  =  E { [ x ( 1  )  -  x( 1/1 ) ] [x( 1 )  -  x( 1/1 ) 1'  } 

=  di  ag  (  r 4|  ,  r2l  )  (  34  ) 

where  r;j  is  the  i-th  diagonal  element  of  the 
measurement  noise  covariance  matrix  R.  (see  Mehra  [84]). 

The  parameter  values  in  Table  X-8  along  with  their 
standard  deviations  show  little  change  from  the  ideal 
case:  the  minimum  attained  is  now  a  little  higher 
(L  =  -297.23).  In  this  run,  the  OLS  starting  values  were 
used . 

■  Estimation  of  Q  and  R. 


. 
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FIGURE  X-8:  Two  dimensional  model:  sensitivity  of  model  to  changes  in  the  parameters. 
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TABLE  X- 8 

Two-Dimensional  Oscillator 
LIKALM  Estimates 

Wrong  initial  values  for  XEo  and  SEo 


Resu 1 ts 

A 1  =  0.00787  ±  0.00187 
A2  =  -0.00959  ±  0.00096 

Likelihood  =  -297.23 


Condi t ions 

100  data  points 
Q  ideal 
R  ideal 

Ao  =  OLS  values 
XEo  =  Z1 
SEo  =  R 


. 
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Having  estimated  the  parameter  vector  A  wi th  Q  and  R 
fixed  and  specified  at  their  ideal  theoretical  values, 
the  estimation  of  Q  and/or  R  along  with  that  of  A  was 
attempted  in  order  to  confirm  the  one -dimensional 
resu Its. 

All  attempts  at  estimating  Q  and/or  R  with  A  fixed 
or  also  estimated,  using  the  first  form  of  the 
information  matrix,  M , ,  led  either  to  divergence 
problems  or  to  a  nonpositive  definite  information 
matrix,  and  the  ensuing  computaional  problems.  This 
confirms  the  results  obtained  with  the  one-dimensional 
model,  where  the  second  derivative  of  the  Likelihood 
function  was  negative  at  the  minimum. 

The  use  of  the  second  form  of  the  information 
matrix,  ,  led  to  much  better  results. 

Since  the  noise  levels  were  chosen  equal  for  both 
equations,  Q  and  R  were  taken  as  diagonal  with  equal 
diagonal  elements: 

q„  =  q22  =  0.000225 

r ||  =  r2z  -  0,0225 

The  estimation  results  are  reported  in  Table  X-9. 

The  best  minimum  attained  by  the  negative  Likelihood 
function  is  when  Q  is  allowed  to  be  negative,  which  confirms 
our  one-dimensional  results.  On  the  other  hand,  the 
estimation  of  R  along  with  A  yields  a  very  good  value  for  R, 
specially  if  we  consider  that  the  sample  covariance  matrix 
of  the  measurement  noise  (with  hundred  data  points)  is  equal 
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to  0.016  (instead  of  the  theoretical  value  of  0.025). 

If  Q  is  constrained  to  remain  positive,  the  estimation  of  A, 
Q  and  R  with  ideal  or  approximate  values  for  XEo  and  SEo, 
and  a  starting  point  at  the  OLS  values,  yields  good 
estimates  for  A  and  Q.  But  Q  is  set  ot  zero  during  the 
convergence  process  and  ends  up  with  a  small  and 
insignificant  positive  value. 

■  If  Q  and  R  present  estimation  problems,  an  other 
approach  might  be  to  choose  "ad-hoc"  values  Known  "a  priori" 
as  is  the  usual  practice.  However,  these  values  might  be 
wrong . 

Table  X - 1 0  illustrates  the  effect  of  choosing  wrong  values 
for  Q  and  R  (taken  as  fractions  or  multiples  of  their  actual 
values);  in  all  these  runs  Ao  is  set  at  its  OLS  value,  XEo 
and  SEo  are  ideal.  The  program  still  converges  in  3 
iterations  in  all  the  runs  tried.  However,  the  final  values 
are  now  affected  by  the  choice  of  Q  and  R. 

As  far  as  the  algorithm  is  concerned,  the  equation 
noise  seems  to  be  irrelevant.  The  best  estimate  is  obtained 
when  R  is  equal  to  its  true  value  and  Q  is  smaller  than  the 
true  value.  However,  variations  in  R  when  Q  is  set  at  its 
true  value  affect  the  Likelihood  function  drastically.  The 
fact  that  the  Likelihood  function  is  much  more  sensitive  to 
R  than  to  Q  is  due  to  the  fact  that  Q  is  not  identifiable  as 
was  shown  by  the  one-dimensional  example.  Thus  this 
experiment  again  verifies  our  previous  results. 

■  Table  X- 1 1  shows  the  effect  of  varying  the  ratio  Q/R 


. 
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TABLE  X-10 

Two-dimensional  Oscillator 
LIKALM  Est imates 
Wrong  values  for  0  and  R 


Cond i t i ons 

o 

o 

II 

< 

A2  =  -0.01 

L i ke 1 i hood 

R  ' 

=  R 

0.00740 

-0.00899 

-285 . 9 

O' 

=  100 

(0.00528 ) 

(0.00276) 

R  ' 

=  R 

0.00689 

-0.00889 

-242 . 2 

O' 

=  R 

(0.01616) 

(0.00878 ) 

R  ' 

=  10R 

0.00874 

-0.00896 

-139.0 

0' 

=  0 

(0.00476) 

(0.00170) 

R  ' 

=  1 00R 

0.00972 

-0.00884 

82 . 32 

0' 

=  0 

(0.00167 ) 

(0.00087 ) 

R  ' 

=  0.  1R 

0.00741 

-0.00899 

47.41 

0' 

=  0 

(0.00253) 

(0.00101 ) 

R  ' 

=  10R 

0.00648 

-0.00874 

-49  .  12 

O' 

=  100 

(0.0515) 

(0.0275 ) 

R  ' 

=  R 

0.00874 

-0.00896 

-300.06 

0' 

=  0.10 

( 0 . 0008 ) 

(0.00032 ) 

Cond i t i ons : 

Ao  =  0L5 

XEo ,  SEo:  ideal 
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based  on  the  approximation  Q  +  R  =  S,  as  explained  in 
section  (X-B.1).  Setting  R  >  Q  improves  the  standard 
deviation  of  the  estimates  (which  is  a  criterion  more 
suitable  for  comparison  than  the  actual  parameter  values 
[84]).  Setting  R  = 1 0 0 • Q  (that  is,  the  measurement  noise  equal 
ten  times  the  equation  noise,  which  is  the  actual 
experimental  condition)  drastically  improves  the  standard 
deviations,  with  only  slight  improvement  if  Q  is  set 
smaller.  This  indicates  to  the  modeler  that  the  major  error 
in  this  case  is  probably  due  to  measurement  noise.  The 
modeler  would  probably  choose: 

A 1  =  0.00856  ±  0.00071 
A2  =  -0.00947  ±  0.00035 

which  have  the  smallest  standard  deviations.  These  values 
are  within  two  standard  deviations  of  the  true  values  and 
give  the  best  minimum  for -the  Likelihood  function. 

■  Finally,  Table  X - 1 2  demonstrates  the  effect  of  using 
different  noise  sequences  with  Ao  =  OLS  values,  XEo  =  Z(1), 
SEo,  Q  and  R  set  as  indicated  in  the  previous  experiment 
using  the  values  of  S  and  S/100. 

All  OLS  estimates  are  insignificant  and  all  are 
improved  by  the  LIKALM  procedure.  Although  the  values  of  the 
estimates  varies,  the  standard  deviations  of  the  estimates 
are  comparable  for  all  the  noise  sequences. 

5 .  Cone 1 us i on 

It  is  apparent  from  this  simple  linear  example  that  the 
LIKALM  procedure  drastically  improves  the  estimates  of 


.  X 
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TABLE  X  -  1 1 

Two-dimensional  Oscillator 
LIKALM  Estimates 

Different  Combinations  of  0  and  R  based  on 
the  Least  Squares  Residual  S 


Cond i t i ons 

A 1  =  0.01 

A2  =  -0.01 

L i ke 1 i hood 

0  =  S  =  0.034 

R  =  S  =  0.034 

0.00686 
(0.01987 ) 

-0.00906 

(0.01072) 

-214.43 

0  =  0.017 

R  =  0.017 

0.00690 
(0.01404 ) 

-0.00907 
(0.00758 ) 

-256 . 44 

0  =  0.03 

R  =  0.003 

0 . 005 1 8 
(0.01843 ) 

-0.00968 

(0.01000) 

-244 . 88 

0  =  0.003 

R  =  0.03 

0.00739 
(0.00609 ) 

-0.00941 

(0.00325) 

-271  .72 

0  =  0.0003 

R  =  0.03 

0.00787 
( 0 . 002 1 6 ) 

-0.00958 

(0.00111) 

-286.08 

0  =  0.00003 

R  =  0.03 

0.00837 

(0.00099) 

-0.00951 

(0.00049) 

-289  .  13 

0  =  3X 10' 6 

R  =  0.03 

0.00854 
(0.00074 ) 

-0.00948 
(0.00036 ) 

-289.66 

0  =  3X 10' 7 

R  =  0.03 

0.00856 
( 0 . 0007 1 ) 

-0.00947 

(0.00035) 

-289 . 70 

Cond i t i ons : 

100  data  points 
Ao  =  OLS 
XEo  =  Z1 
SEo  =  R 


r  ^ 


OLS  and  LIKALM  estimates  with  different  noise  sequences 
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parameters  and  their  confidence  intervals,  in  the  presence 
of  measurement  noise,  if  the  modeler  is  in  a  position  to 
specify  reasonable  bounds  for  the  covariance  matrices.  There 
is  an  identification  problem  if  we  attempt  to  estimate  all 
the  unknown  parameters  (including  the  covariance  matrices) 
since  the  data  we  have  is  insufficient  to  provide  for  the 
estimation  of  all  the  parameters. 

Having  gained  some  confidence  in  the  performance  of  the 
program  for  this  simple  case,  it  was  then  attempted  to  apply 
the  method  to  a  nonlinear  case. 

B.3  Three-Dimensional  Nonlinear  Model 

The  cost  of  running  the  Market -Growth  model  being  very 
high,  the  following  three  dimensional  model  was  chosen  to 
determine  the  behavior  of  the  algorithm  under  a 
specification  very  similar  to  the  Market -Growth  model. 

1 .  The  model 


The  equations  of  this  three  dimensional  nonlinear  model 
are,  in  their  FORTRAN  form: 


TX 1 

=  XI 

+ 

DT  * 

( 

A 1  *X 1 *X2  +  A2*X22/X3 

+  U1  ) 

(35) 

TX2 

=  X2 

+ 

DT  * 

( 

A3*X3/X 1  +  A4*X2  +U2 

) 

(36) 

TX3 

=  X3 

+ 

DT  * 

( 

A5*X 1 /X2  2  +A6*X2  +U3 

) 

(37) 

and : 

Y  1  = 

TX  1 

+ 

VI 

(38) 

Y2  = 

TX2 

+ 

V2 

(39) 

Y  3  = 

TX3 

+ 

V  3 

(40) 

where : 

. 
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A 1  =  -0.001326 
A2  =  0.0424 
A3  =  34.57 
A4  =  -1.45 
A5  =  4.3217 
A6  =  0.4187 

with  Ui  and  Vi  random  sequences  generated  by  GGNOF  with 
different  seed  numbers  and  standard  deviations  equal  to  1% 
and  10%  of  the  mean  of  the  relevant  level  variables 
respect i ve ly . 

The  starting  values  of  the  model  are: 

X1C  =  4. 

X20  =  150. 

X30  =  1. 

The  model  exhibits  a  nonlinear  behavior  illustrated  by 
Figure  X-9.  with  DT  =  0.2 

Since  the  equation  noise  is  also  multiplied  by  DT  in 
the  data  generating  model,  the  standard  deviations  of  the 
equation  noise  sequences  in  the  GGNOF  functions  were 
adjusted  by  dividing  them  by  DT.  The  theoretical  (rounded 
off)  values  for  Q  and  R  are: 

q  ii  =  2-25 

q21  =  0.0625 
q  33  =  2-25 

and 

r  n  =  225. 


6.25 
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2.  OLS  estimates 

The  results  of  the  OLS  runs  with  100  data  points  are 
presented  in  Table  X -  1 3 . 

All  parameters  in  equations  1  and  3  are  insignificant  and 
deviate  strongly  from  their  true  values.  The  parameters  in 
equation  2  are  much  better  and  deviate  from  their  true 
values  only  by  a  factor  of  1.2.  Therefore,  the  modeler  would 
have  probably  accepted  the  estimates  in  equation  2  and 
rejected  those  in  equation  1  and  3.  He  has  however  no  way  of 
improving  on  the  specification  of  equation  1  and  3  that  are 
already  perfectly  specified. 

3.  LIKALM  runs 

Table  X - 1 4  illustrates  the  results  of  the  LIKALM  runs 
for  4  cases  correspondi ng  to  progressive  relaxation  of  ideal 
condi t ions . 

a .  Run  1 :  all  idea  1 

When  the  program  is  run  with  ideal  values  for 
A0,  XE0,  SE o,  Q  and  R,  it  converges  to  very  good 
parameter  estimates  in  three  iterations. 

All  parameters  are  significant  at  the  95%  level  of 
probability  and  fall  within  1  to  2  standard 
deviations  from  their  true  values.  The  program 
therefore  theoretically  has  the  capacity  of 
estimating  such  a  nonlinear  model. 

b.  Run  2:  OLS  starting  values 


. 

. 


■ 
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TABLE  X -  1  3 

Three-dimensional  model 
OLS  estimates 


Parameter 

True  value 

OLS 

( st .  dev .  ) 

R  ? 

DW 

SSR 

A  1 

-0.001326 

-0.0030 

0.0193 

(0.0026) 

3 .04 

59469 . 

A2 

0.0424 

0. 1735 

59469 . 

(0. 1685) 

A3 

34 . 57 

29.07 

0.6032 

(2.42) 

2 . 68 

A4 

-1.45 

-1.27 

1144. 

(0.11) 

A5 

4.3217 

1 . 1262 

0.0010 

( 10.906) 

2 . 92 

A6 

0.4187 

0 . 4190 

47813 . 6 

(0.2939) 

Cond i t i ons : 

100  data  points 

0=1%.  R  = 1 0% 

VI  =  36597 
V 2  =  21485 
V3  =  724195 


Noise  sequence  seeds: 


U1 

U2 

U3 


43567 

56493 

52486 


-dimensional  model 
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The  algorithm  converges  to  the  same  final  values  as 
i n  run  1 . 

c.  Run  3:  Ao  =  OLS,  XEo  =  Z  and  SEo  =  R. 

The  final  values  are  now  worse  (specially  for 
equation  2  where  OLS  values  are  comparable  )  but, 
except  for  equation  2,  all  fall  within  1  to  2 
standard  deviations  of  their  true  values. 

d.  Run  4:  Ao  =  OLS,  XEo  =  Z  ,  SEo  =  R,  R  =  S62  and 
Q  =  S/100. 

The  final  values  are  very  similar  to  those  obtained 
for  run  3,  but  the  standard  deviations  are  now  much 
larger  expressing  a  greater  uncertainty  in  the 
quality  of  the  estimates.  This  is  to  be  expected 
since  the  values  of  S  are: 

S„  =  594. 

S22  =  11.4 
S3J  =  478. 

so  that  even  with  a  perturbation  of  about  100%  in 
the  values  of  Q  and  R,  the  algorithm  still 
converges . 

4 .  Cone  1 us i on 

From  the  above  results  with  the  nonlinear 
three-dimensional  model,  it  appears  that  the  LIKALM 
program  has  the  capacity  to  improve  drastically  over  OLS 
estimates  of  a  nonlinear  model  in  the  presence  of 
equation  and  measurement  noise.  Although  the  behavior  of 


6 2 See  page  169 
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this  three-dimensional  model  is  not  as  unstable  as  that 
of  the  Market -Growth  model,  the  non  1 i near i t i es  present 
here  are  of  the  same  kind  and  can  be  taken  as  an  example 
of  the  non  1 i near i t i es  generally  found  in  system  dynamics 
mode  1 s . 

The  instability  of  the  model,  the  singularities 
that  appear  are  typical  of  what  can  be  expected  from  a 
system  dynamics  model . 

Although  Q  and  R  were  not  estimated  along  with  the 
other  parameters,  the  use  of  the  residual  noise  process 
covariance  matrix  S  as  an  approximation  for  R  +  Q,  shows 
that  a  wide  range  of  variation  in  their  values  is  still 
possible  and  yields  acceptable  results. 

A  whole  range  of  new  problems  is  to  be  expected  as 
the  size  of  the  model  approaches  those  usually 
encountered  in  "Real -Life"  models.  The  following 
experiments  with  the  Market -Growth  model  illustrate  this 
fact  very  well  (although  this  model  is  still  of  modest 
size  ) 

B.4  The  Market -Growth  model 

As  it  is  written,  the  Market -Growth  model  does  not 
directly  lend  itself  to  estimation  in  the  state-space  form 
since  equation  ( V 1 1  - 8 )  is  not  a  state  equation  but  merely  an 
identity63.  The  variable  DDC  had  to  be  substituted  into 
equation  ( V 1 1 - 4 )  (the  only  equation  where  DDC  appeared)  thus 
reducing  the  system  to  nine  equations.  The  parameter  values 


6  3 


See  equations  of  the  model  page  77 


. 


196 


in  equation  (VI I -4)  are  therefore  changed  and  the  new  form 
of  the  equation  is  as  follows: 

dPCO0 1  =  DT» [ K'  1 4»PC ( -DT )  +  K'  1 5»PC ( -DT ) »DDRC ( -DT ) 

+  K7  1 6«PC ( -DT ) »DDRC ( -DT ) 2  +  K7  1  7»PC ( -DT ) »DDRC ( -DT ) 3 
-  0 . 25*PC00 1 (-DT)  +  N4 ]  (41 ) 

Where : 

K7  14  =  -0 .  11517 
K7  15  =  0.09026 
K7  16  =  -0.02643 
K7  17  =  0.00338 

All  the  other  equations  remain  the  same. 

1 .  OLS  runs 

Although  the  model  has  been  thoroughly  estimated 
with  OLS  in  section  ( V 1 1  - B )  new  OLS  runs  had  to  be  tried 
with  the  new  form  of  equation  (4)  and  the  new  data 
generated  by  the  model .  The  data  generated  by  the  two 
versions  of  the  model  differ  slightly  due  to  different 
noise  sequences. 

The  noise  sequences  were  generated  as  indicated  in  Table 
X-  1 5 . 

The  OLS  results  with  this  new  version  of  the  Market 
model  are  presented  in  Table  X - 1 6  and  can  be  compared  to 
those  indicated  in  Tables  V 1 1  - 8  to  V 1 1  -  1 1 . 

Since  the  two  data  sets  differ,  the  estimates  for 
all  equations  are  not  the  same  as  previously.  Estimates 
for  K1,  K2  and  K3  are  now  significant  and  better  than 
previously  estimated.  All  estimates  for  equation  2  are 
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TABLE  X -  1  5 


Noi  se 

Sequences  for  Market  model 

Seed 

numbers  for  GGNOF 

rout i ne 

Equation  noise 

Measurement 

noise  for 

947312 

67751 

BL 

2345 

93023 

DRA 

6312 

10735 

S 

95642 

76925 

PC001 

44872 

5675 

DDRC 

443521 

22768 

DDRM 

31356 

PC 

Market  model 
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0=1%.  R= 10% 
t  is  the  t-statistic 

(  R  7  )  is  for  estimation  in  the  first  difference  form  (see  section  VH) 
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still  significant  but  worse  than  their  previuos 
counterpar ts .  The  values  of  K10  and  K11  in  equation  3 
are  identical  in  both  cases.  The  coefficients  in 
equation  4  cannot  be  compared  since  they  now  have 
different  values.  However,  they  still  all  have  the  wrong 
sign  and  are  all  insignificant.  Overall  the  OLS 
estimates  obtained  for  this  single  run  confirm  the 
results  obtained  earlier  as  a  verification  of  Senge7 s 
results  (see  section  (VI-B.2)). 

2.  LIKALM  runs 

In  order  to  avoid  computaional  problems  with  the  LIKALM 
program  (due  to  the  great  number  of  matrix  operations 
and  the  wide  disparity  in  variable  values64),  the  data 
was  scaled  as  follows: 

BL#  =  BL/100 
DRA7  =  DRA/100 
S7  =  S 

PC0017  =  PC001/10 
PC0027  =  PC002/10 
PC0037  =  PC003/10 
PC7  =  PC/100 
DDRC7  =  1 0*DDRC 
DDRM7  =  1 0*DDRM 

The  new  parameter  values  to  be  estimated  are  presented 
in  Table  X- 1 7 . 

OLS  estimates  with  this  new  scaled  data  are  very 


Ranging  from  0.4  to  120000 
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TABLE  X -  1 7 
Market-Growth  Model 
Scaled  parameter  values 


Parameter 

True  value 

Scaled  value 

K  1 

475 

4 . 75 

K2 

-61.5 

-0.00615 

K3 

-0.6178 

-0.6178 

K4 

0. 1324 

0.  1324 

K5 

-0.00975 

-0.00975 

K6 

0.6178 

0.6178 

K7 

-0. 1324 

-0. 1324 

K8 

0.00975 

0.00975 

K9 

-1.0 

-1.0 

K  10 

3 . 10x10‘ 1 

0.03 

K  1  1 

-0.05 

-0.05 

K  1  2 

-0. 11517 

-1.1517 

K  1  3 

0.09026 

0.09026 

K  14 

-0.02643 

-0.002643 

K  15 

0.00338 

0.000034 

• 
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similar  (but  scaled)  to  those  presented  in  Table  X - 1 6 
and  can  be  found  in  Table  X-18. 

The  cost  of  running  the  LIKALM  program  with  the 
entire  Market -Growth  model  being  very  high65,  only  50 
data  points  were  used  in  the  following  experiments. 

As  a  first  experiment,  the  equations  were  estimated 
one  at  a  time,  under  ideal  condi tons66  all  other 
parameters  being  fixed  at  their  true  values. 

The  estimation  results  are  presented  in  Tables  X - 1 9 
to  X  -  2 1 . 

In  all  cases,  the  program  converges  to  values  that  are 
very  close  to  and,  in  most  cases,  within  one  standard 
deviation  of  the  true  values.  The  accuracy  of  this 
method  as  compared  to  OLS  in  striking,  in  particular  for 
equations  3  and  4.  Equation  4,  however  seems  to  present 
some  convergence  difficulties,  and  although  the 
estimates  are  much  better  than  in  OLS,  K14  and  K15  are 
still  inacceptable  even  though  the  standard  deviations 
seem  to  indicate  little  error. 

Having  thus  established  the  capacity  of  the  program 
to  converge  to  a  suitable  minimum  with  this  complex  and 
highly  nonlinear  model,  for  each  individual  equation, 
estimation  of  the  parameters  in  the  first  three 
equations  was  then  attempted.67 


65  $60  per  iteration  for  50  data  points 

66  See  section  (X-B.2)  for  definition  of  "ideal" 

67  Equation  4,  which  had  dispalyed  a  bad  behavior  in  the 
first  runs,  was  left  out  here 
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The  results  of  this  run  (still  under  ideal 
conditions),  in  Table  X-22,  do  not  indicate  much  change 
from  the  previous  single  equation  runs. 

The  introduction  of  equation  4  in  the  set  of 
parameters  to  be  estimated  does  however  tend  to  alter 
the  estimation  results  and  the  rate  and  conditions  of 
convergence . 

Because  of  the  high  cost  of  the  runs,  only  a  few 
experiments  were  attempted  with  the  entire  model. 

1.  Runs  with  Q=1%  and  R  = 1 0% ,  and  ideally  specified. 

a.  If  the  initial  values  are  chosen  as  the  OLS 
values  from  Table  X - 1 8 ,  with  XE0  =  Z1  and 
SE^  =  R,  the  estimation  problem  is  so 
ill-conditioned  that  the  algorithm  runs 
consistently  into  divergence  or  numerical 
singularity  problems. 

b.  If,  all  other  conditions  remaining  the  same,  the 
signs  of  the  parameters  in  equation  4  are 
changed  to  correspond  to  their  true  values,  the 
algorithm  still  faces  the  same  problems. 

c.  If  now,  the  starting  point  is  changed  to  the 
true  values  altered  by  10%  (  A  -  10%),  the 
algorithm  converges  after  a  slight  overshoot, 
and  the  values  after  5  iterations  are  reported 
in  Table  X-23  along  with  the  true  and  OLS 
values.  The  successive  values  taken  by  the 
negative  Likelihood  function  are: 
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TABLE  X-23 


Market  model 

LIKALM  estimation  of  entire  model:  Ao  =  A  -  10% 


Parameter 

T  rue  va 1 ue 

OL  5 

( st .  dev  .  ) 

LIKALM 
( st . dev .  ) 

K  1 

4 . 75 

3 . 73 
(0.93 ) 

4 . 30032 
(0.00030) 

K2 

-0.0615 

-0.0506 
(0.0123 ) 

-0.05506 

(0.00003) 

K3 

-0.6178 

-0 . 406 1 
(0. 1783 ) 

-0 . 54395 
(0.00155 ) 

K4 

0. 1324 

0.0645 
(0.0394 ) 

0. 10007 
( 0 . 00002  ) 

K5 

-0.00975 

-0.00367 
(0.00325 ) 

-0.00759 
( 0 . 00006  ) 

K6 

0.6178 

0.3367 
(0.0599  ) 

0 . 54925 
( 0 . 00007 ) 

K7 

-0. 1324 

-0.0686 

(0.0136) 

-0.11310 
(0.00035 ) 

K8 

0.00975 

0.00485 

(0.00112) 

0.00827 
( 0  00002 ) 

K9 

-1.0 

-0.53 

(0.10) 

-0.90009 
( 0 . 0000 1  ) 

K  10 

0.03 

0.007 
(0.028 ) 

0 . 02803 
( 0 . 000 1 7 ) 

K  1  1 

-0.05 

-0. 255 
(0.065) 

-0.04506 
( 0 . 0000 1  ) 

K  1  2 

-1.1517 

0.7818 
(0.5471 ) 

-  1 .00308 
(0.00125 ) 

K  1  3 

0.09026 

-0.08838 
(0.04865 ) 

0.09112 
( 0  00066 ) 

K  14 

-0.002643 

0.002631 
(0.001388 ) 

-0.00324 
( 0 . 00007  ) 

K  15 

0.000034 

-0.000016 
( 0 . 0000 1 3 ) 

0 . 00004 
( 0 . 00000 ) 

Cond i t i ons : 

0=1%.  R = 1 0% 

LIKALM:  50  data  points,  0L5 :  100  data  points 

All  ideal  except  Ao 
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2655,  1380,  1475,  1069,  1030, 
Convergence  is  slow  because,  even  though 
the  gradient  values  are  in  the  hundreds  or 
thousands,  the  increments  at  each  iteration  are 
very  small  due  to  the  small  size  of  the  elements 
in  the  inverse  of  the  information  matrix  (of  the 
order  of  10~8  in  the  average.  Thus  the  initial 
values  are  not  much  affected  even  after  5 
iterations.  The  standard  deviations  are  also 
cor  r espond i ng 1 y  sma 1 1 . 

2.  Runs  with  Q=1%  and  R  = 1 0%  but  R  specified  at  10  times 
its  value. 

These  runs  were  attempted  to  investigate  the  effect 
of  mi sspeci f icat ion  of  R  and  because  previous 
experiments  with  the  two-dimensional  model  showed 
that  overspecifying  R  did  affect  the  information 
matrix  and  consequently,  the  rate  of  convergence, 
much  more  than  that  of  Q. 

a.  With  Ao  equal  to  the  OLS  values,  all  other 
initial  values  as  in  the  previous  experiments, 
the  value  of  the  negative  Likelihood  function  on 
the  first  iteration  is  now  much  better  than  in 
the  first  run  (1245  as  compared  to  the  previous 
4723),  thus  indicating  a  better  fit,  but  the 
algorithm  then  starts  diverging  and  ends  up  with 
numerical  singularity. 

b.  The  same  happens  if  we  start  with  A  -  10%. 
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A  few  other  experiments  with  this  model  and  with  the 
smaller  order  models  were  tried  to  attempt  to 
correct  some  of  the  numerical  problems,  but  without 
much  success.  The  number  of  experiments  attempted 
was  very  much  limited  by  the  cost  of  the  runs,  and 
of  course  given  the  size  of  the  information  matrix 
and  the  results  of  the  experiments  with  smaller 
models,  the  estimation  of  Q  or  R  was  not  attempted. 
3.  As  a  matter  of  interest,  a  run  was  made  with  Q  =  R  = 1  % 
but  R  specified  as  10  times  its  value. 

Starting  with  A  -  10%,  the  algorithm  converges  after 
5  iterations  to  values  ver  similar  to  the  Least 
Squares  values  for  this  case.  As  expected,  when  the 
measurement  noise  is  not  high,  the  OLS  and  LIKALM 
values  are  comparable... 

3 .  Cone  1 us i on 

From  these  few  experiments  it  appears  that  the 
convergence  zone  is  much  narrower  than  in  the  previous 
cases.  The  Likelihood  surface  is  probably  very  sharp 
around  the  minimum.  The  instability  of  the  model  is  a 
major  limiting  factor.  It  is  actually  a  good  proof  of 
the  robustness  of  the  algorithm  that  it  would  still 
converge  wi th  Ao  =  A  -  10%,  given  the  behavior  of  the 
model  with  these  parameter  values,  as  illustrated  by 
Figure  X -  1 0 . 

Due  to  the  highly  nonlinear  character  of  the  model, 
to  its  instability  and  high  sensitivity  to  parameter 
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FIGURE  X-10:  Sensitivity  of  two  selected  variables  of  the  Market -Growth  model 
to  a  10%  change  in  all  parameters. 
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changes,  convergence  of  the  program  is  dependent  on  the 
choice  of  starting  values  and  encounters  many 
computational  problems.  This  is  however  a  characteristic 
of  many  iterative  gradient  algorithms.  Mehra  [112] 
notes : 

"The  use  of  Maximum  Likelihood  estimation  in 
practice  leads  to  difficult  nonlinear 
programming  problems.  It  is  not  uncommon  to  find 
situation  where  the  likelihood  surface  has 
multiple  maxima,  saddle-points,  discontinuities, 
and  singular  Hessian  in  the  parameter  space..." 

Mehra  also  points  out  in  [112]  that  the  method  used 
here  runs  into  problems  when  the  information  matrix  is 
singular  or  nealy  singular,  in  which  cases  two 
situations  may  arise: 

1.  The  computed  M_1  may  turn  out  to  be  indefinite  or 
negative  definite,  so  that  J(Ai+l  )>J(Aj)  where  J  is 
the  negative  likelihood. 

2.  The  step  size,  DA;  =  ;  »M  •  1  »g ;  6  8  is  very  large  in 

singular  directions. 

This  happens  when  the  first  form  of  the  information 
matrix  is  used69.  The  use  of  the  second  form,  however 
forces  the  information  matrix  to  remain  positive.  This 
does  not  avoid  numerical  singularity  in  any  case. 

A  few  techniques  have  been  suggested  for  handling 
singular  or  nearly  singular  information  matrices  and  are 

68  See  p.  135 

69  See  section  (IX-F.1) 
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presented  in  Mehra  [112],  They  all  involve  a  great 
number  of  further  computations,  reductions  in  the  order 
of  matrices,  etc..,  with  no  guarantee  of  convergence. 
These  are  all  complex  problems  of  numerical  analysis  and 
given  the  cost  of  the  runs  and  the  number  of  runs  that 
should  be  tried  in  order  to  obtain  any  satisfactory 
results,  no  further  experiments  were  attempted. 

This  leaves  us”  with  quite  a  di  ssappoi  nt  i  ng  picture 
of  the  performance  of  the  LIKALM  program  in  the  case  of 
the  Market -Growth  model. 
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XI.  CONCLUSION 


A.  Introduction 

The  original  purpose  of  this  thesis  was  to  attempt  to 
bridge  the  gap  between  econometric  and  system  dynamics 
mode  1 -bui Idi ng  by  providing  a  technique  allowing  for  formal 
estimation  of  parameters  in  a  nonlinear  System  Dynamics 
feedback  model,  in  the  presence  of  measurement  noise.  If 
such  estimation  proved  satisfactory  and  if  test  statistics 
associated  to  it  proved  useful  in  determining  acceptable 
confidence  in  parameter  estimates  and  guided  the  modeler 
towards  possible  better  specification,  then  the  argument 
between  econometric  and  system  dynamics  modelers  (  and  in 
particular  the  issues  of  "arbitrary"  parameters,  lack  of 
reliance  on  data  in  system  dynamics  and  the  requirements  of 
model  validation)  would  have  less  reason  to  exist. 

No  available  econometric  estimation  technique  proved 
capable  of  meeting  the  requirements  of  the  estimation  of 
parameters  in  a  nonlinear  dynamic  feedback  model  in  the 
presence  of  measurement  noise  since  the  emphasis  of 
econometric  research  has  been  on  simultaneous  equations 
systems  and  models  with  perfect  measurement.  The  recent 
shift  to  study  the  effect  of  measurement  noise  remains 
within  the  framework  of  linear  simultaneous  equations 
systems . 

On  the  other  hand,  on  the  control  side,  the  rapidly 
growing  number  and  range  of  applications  of  the  Filtering 
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Form  of  the  Maximum  Likelihood  algorithm,  which  proved  its 
capacity  to  perform  very  well  in  the  context  of  noisy 
dynamic  nonlinear  models  of  physical  systems,  suggested  that 
this  approach  might  be  useful  in  the  field  of  socio-economic 
modeling  and  provided  a  base  for  its  testing  in  this 


context . 

Indeed 

this  powerful  technique  has,  in  the  various 

experiments 

conducted  here,  proven  its  capacity  to 

dr as t i ca 1 1 y 

improve  upon  the  Least  Squares  estimates, 

provided  the  models  are  small,  perfectly  specified,  and 
sufficient  "a  priori"  knowledge  about  the  equation  and 
measurement  noise  satitistics  is  available. 

Such  detai led  knowledge  of  both  the  process  at  hand  and  the 
noise  characteristics  exists  in  most  engineering 
applications  and  this  explains  the  success  of  the  LIKALM 


approach  in 

those  fields. 

I f  however , 

we  now  move  to  socio-economic  models  where  the 

problems  at 

hand  are  typically  ill-suited  for  formal 

mathematical  treatment,  the  number  of  variables  and  the 
parameters  to  be  estimated  several  orders  of  magnitude 
larger  than  the  typical  physical  applications,  the  degrees 
of  freedom  and  uncertainties  numerous,  the  measurement  noise 
difficult  to  estimate  and  the  residual  processes  in  the 
model  specification  (which  "explain"  the  uncertainties  in 
functional  relationships)  large,  one  can  no  more  assume  "a 
priori"  knowledge  of  the  noise  statistics  nor  near  perfect 


speci f icat ion . 
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The  noise  covariance  matrices  must  then  also  be  estimated 
and  this  entails  various  identification  and  numerical 
problems.  Many  limitations  of  the  LIKALM  algorithm  become 
increasingly  critical  and  preclude  its  use  in  socio-economic 
applications,  and  in  particular  in  System  Dynamics  models. 

Some  of  these  limitations  are  outlined  below. 

B.  Limitations  of  the  LIKALM  algorithm 

1.  One  of  the  most  important  problems  in  the  applicability 
of  the  Maximum  Likelihood  algorithm  to  large-scale 
systems  is  the  great  number  of  computational  problems 
that  occur.  Mehra  [112]  deals  at  lenght  with  some  of 
these  aspects.  Convergence  and  ident i f i abi 1 i ty  problems 
are  typical  of  this  kind  of  complex  programs.  Typical 
also  are  the  great  number  of  "ad-hoc"  adjustments  and 
the  "hammering"  that  is  needed  to  get  the  algorithm  to 
converge,  which  depend  very  much  on  the  talent  of  the 
experimenter  and  his  detailed  knowledge  of  both  the 
model  at  hand  and  the  program.  A  great  fault  of  many 
complex  "packages"  of  this  kind  is  that,  in  order  to 
allow  for  maximum  generality  and  flexibility,  thus 
containing  several  different  search  algorithms,  for 
example,  they  become  so  intricate  as  to  preclude  their 
use  by  anyone  other  than  their  creator 

2.  Once  the  algorithm  has  converged,  there  is  no  guarantee 
that  it  has  converged  to  the  absolute  maximum  of  the 
Likelihood  function  which  is  the  only  one  for  which  the 
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estimate  is  proven  to  be  unbiased,  consistent  and 
efficient . 

3.  All  the  theoretical  results  are  for  linear  systems.  The 
degree  to  which  the  "extended"  Kalman  Filter 
approximation  is  valid  is  not  clear  and  needs  to  be 
investigated . 

4.  The  use  of  the  Kalman  Filter  is  dependent  upon  the 
perfect  specification  of  the  model.  What  the  values  of 
the  parameters  mean  in  the  case  of  imperfect 
specification  is  still  to  be  examined. 

Another  limitation  inherent  to  the  Kalman  Filter  is  that 
this  formulation  requires  that  all  the  state  equations 
be  included  in  the  model  of  the  system  to  be  estimated. 
In  other  words,  the  whole  system  has  to  be  entered  in 
the  program  even  if  only  a  few  parameters  are  to  be 
estimated.  A  System  dynamics  model  typically  includes  up 
to  200  state  equations.  All  of  these  would  have  to  be 
included  in  the  formulation  of  the  Kalman  filter  unless 
some  way  of  separating  strongly  decoupled  "subsystems" 
in  the  model  can  be  devised. 

5.  Apart  from  the  fact  that  the  diagonal  elements  of  the 
inverse  of  the  information  matrix  provide  estimates  of 
the  variance  of  the  estimates,  nothing  much  has  been 
said  here  about  various  statistical  tests  of 
significance  or  overall  fit  and  their  applicability. 

6.  Last  but  not  least,  one  of  the  limiting  factors  that 
assumed  the  greatest  importance  in  the  last  experiment 
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is  the  cost  of  running  the  Market -Growth  model. 

As  it  stands,  the  program  with  the  built-in  "hard" 
limits  of  10  equations  and  40  parameters,  because  of  the 
many  matrix  computations  and  the  size  of  the  matrices 
that  appear  in  the  sensitivity  equations,  requires  over 
1.5  million  bytes  of  storage  space.  In  the  process  of 
computing  the  information  matrix,  the  four-dimensional 
matrix 

SS  =  Sz- 1  •  ( dSz/dA )  »Sz" 1  •  ( dSz/afA ) 
of  maximum  dimensions  (10,10,40,40)  occupies  by  itself 
640,000  bytes. 

One  iteration  through  50  data  points  with  the 
Market-model  takes  114  CPU  seconds  which,  at  a  low  rate 
of  $15  for  20  CPU  seconds  at  the  University  of  Alberta, 
costs  about  $60.  A  minimum  of  3  to  4  iterations  is 
needed  to  observe  convergence,  which  puts  a  run  to  an 
average  of  about  $200. 

This  is  using  only  50  data  points.  The  Maximum 
Likelihood  method  needs  long  data  sequences,  tipycally 
from  300  to  1000  data  points,  since  all  its  properties 
are  asymptotic. 

This  program  was  written  without  regard  to  any  form 
of  optimization.  It  can  confidently  be  stated  that  the 
storage  required,  and  consequently  the  cost  of  running 
the  program,  can  be  cut  by  a  factor  of  5  to  10. 

It  remains  nonetheless  that,  even  though  this  LIKALM 
program  is  vey  inefficient,  the  high  cost  is  a  general 
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feature  of  this  aproach.  Mehra  [84]  reports  a  cost  of  1 
minute  of  CPU  time  per  iteration  for  a  model  with  four 
state-variables,  23  parameters  and  about  200  data 
points.  Similar  cost  limitations  are  emphasized  for  the 
GPSI E  program  [ 85 ] . 

Even  if  all  the  above  limitations  were  solved,  the 
costs  involved  for  a  thorough  formal  estimation  of  a 
large-scale  model  would  make  the  use  of  this  algorithm 
extremely  impractical,  bearing  in  mind  that  one  run  of  a 
200  equation  System  Dynamics  model  with  the  Dynamo 
compiler  costs  only  about  $2. 

From  the  above  limitations  and  the  experience  gained 
from  the  experiments,  it  appears  that  such  formal  parameter 
estimation,  which  to  be  consistent  would  have  to  allow  for 
all  the  non-ideal  situations  described  in  Figure  VI-2,  is 
not  the  natural  approach  for  the  validation  of  System 
Dynamic  models. 

It  might  be  useful  at  this  point  to  restate  briefly 
some  significant  aspects  of  the  System  Dynamics  methodology. 

The  models  are  built  with  emphasis  on  real-world 
meaning  for  all  parameters,  on  understanding  model  behavior 
in  the  light  of  the  complex  structure  of  feedback  mechanisms 
rather  than  on  prediction,  intuitive  understanding  of  the 
processes  at  work  rather  than  on  heavy  reliance  on  data... 
These  models  still  have  to  be  validated  through  contact  with 
data.  Flowever ,  in  light  of  these  characteristics,  it 
certainly  is  more  worthwhile  investigating  other  means  of 
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model  validation,  more  suited  to  the  modeling  approach  and 
the  nature  of  the  models  at  hand  than  to  try  to  force  a 
ill-suited  approach  into  behaving  properly. 

C.  Suggestions  for  Further  Research 

In  their  effort  to  demonstrate  the  fallacy  of  using 
single  equation  statistical  tests  for  the  selection  of  model 
variables,  Mass  and  Senge  [ 7  0 ] 7  0  show  that  such  statistical 
tests  "should  not  be  viewed  as  tests  of  model  specification 
per  se ,  but  as  tests  of  a  particular  type  of  data 
usefulness:  they  warn  the  modeler  when  available  data  do  not 
permit  accurate  estimation  of  a  model  parameter". 

However,  they  illustrate  the  fact  that  a  model 
relationship  may  be  difficult  to  estimate  yet  extremely 
important  for  overall  behavior  and  thus  state  that  "tests 
which  analyze  the  impact  of  individual  variables  on  model 
behavior  are  beter  suited,  both  theoretically  and 
operationally,  to  the  task  of  selecting  model  variables". 

Following  this  line  of  thought  and  in  an  effort  to 
formalize  such  a  subjective  process  as  sensitivity  testing, 
Ford  et  al .  [71]  have  recently  developed  a  technique  that 
makes  sensitivity  testing  less  difficult  and  more  automatic. 

In  testing  a  given  model,  the  behavior  of  output 
variables  over  the  simulation  period  for  variations  in  the 
parameters  is  collected  for  a  great  number  of  runs  (100). 
This  vast  amount  of  raw  information  is  then  analyzed  through 


7  °Refer  to  section  (IV-C.1) 
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formal  and  informal  methods. 

In  the  formal  analysis,  partial  correlation 
coefficients  of  an  output  variable  and  all  inputs  are 
calculated  and  filtered  over  time  (to  isolate  "persistent" 
correlations)  to  be  used  in  selecting  Key  input  variables 
that  appear  to  have  the  most  influence  on  the  output 
variable  of  interest. 

In  the  informal  analysis,  the  partial  correlation 
coefficients  are  used  in  combination  with  the  various 
display  options  to  examine  the  model's  behavior  under  widely 
varying  input  conditions. 

These  two  processes,  involving  both  the  model  builder 
and  the  statistician,  allow  a  model's  behavior  and  weak 
points  to  be  assessed  thoroughly  and  focus  research,  careful 
verification  and  refinement  on  the  sensitive  relationships 
of  the  model . 

On  the  other  hand,  much  tedious  theoretical  work  is 
still  being  carried  out  in  the  field  of  formal  statistical 
estimation  of  socio-economic  models,  and  in  particular  in 
the  use  of  the  Filtering  Form  of  the  Maximum  Likelihood 
estimator  and  indeed,  and  many  researchers  throughout  the 
world  are  continually  refining  our  theoretical  understanding 
of  the  issues  involved. 

These  efforts  however  are  attempts  at  stretching  out  to 
its  limits  the  practice  of  parameter  estimation  as  we  now 
concieve  it.  The  past  hundred  years  or  so  have  seen  a 
dramatic  break  in  "normal"  trends.  We  are  living  in  an  age 
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of  transition  that  must  be  characterized  by  an  equivalent 
change  in  the  way  we  look  at  the  world,  or  a  change  in 
"paradigm" 7 1 . 

The  ways  of  looking  at  things  that  we  have  in  the 
past  accepted  as  common  sense  really  do  not  work 
under  all  circumstances  and  it  is  very  likely  that 
we  have  reached  a  period  of  history  when  they  do  not 
match  the  types  of  processes  which  are  going  in  the 
world  at  1 arge ..."  [117] 

The  many  limitations  of  traditional  mode  1 -bui Idi ng  and 
parameter  estimation  call  for  a  reassessment  of  the  process 
of  mode  1 -bui ldi ng  itself.  Causality  in  society  is  a  much 
more  "fuzzy"  concept  than  in  the  physical  sciences  and  yet, 
we  are  carrying  our  physical  way  of  thinking  into  the  field 
of  socio-economic  models.  Rather  than  remaining  within  the 
framework  of  deterministic  relationships  between  variables, 
we  should  be  exploring  ways  that  allow  for  more  flexibility. 
The  investigation  of  causality  trough  the  use  of  information 
theoretic  concepts  such  as  the  mutual  information  entropy 
between  variables  and  other  less  restrictive  approaches 
might  well  prove  to  be  more  suited  for  the  processes  at 
hand . 


7 iSee  T.  KUHN  [116] 
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